filter list fast

Sat Mar 18 08:26:35 EST 2006

Diez B. Roggisch wrote:
>>Both of these techniques are O(n^2).  You can reduce it to O(n log n)
>>by using sets:
>>
>>>>>set2 = set(list2)
>>>>>[x for x in list1 if x not in set2]
>>
>>Checking to see if an item is in a set is much more efficient than a
>>list.
> 
> Is the set-lookup reliably O(log n)? I was under the impression that it is
> hash-based, and this should be O(1) usually, but couldbve O(n) worst-case
> (hash the same for _all_ entries).

That's largely a theoretical concern.  Google for something like

   '''dict worst-case performance "tim peters"'''

to learn more.  (The third article there (no doubt obsolete in some 
ways, given that it was in 2000) says that Python "keeps at least 1/3 of 
the internal hash table entries unused, making collisions very rarely a 
problem...  It's possible to contrive keys that will cause collisions 
systematically ... but unlikely to happen by accident in 2.0")

-Peter