List performance and CSV

Wed Oct 12 02:59:02 EDT 2005

jepler at unpythonic.net wrote:
> But a really fast approach is to use a dictionary or other structure
> that turns the inner loop into a fast lookup, not a slow loop through
> the 'Customers' list. 

Another approach is to sort both sequences, loop over
both in one loop and just update the index for the smaller
item. Something like this (below, I'm assuming no duplicates
in the lists, and I don't know if that's true for your [2]):

 >>> l1 = (1,2,4,6,7,8,9)
 >>> l2 = (2,5,6,7,9,10)
 >>> i1 = i2 = 0
 >>> while 1:
...     if l1[i1]==l2[i2]:
...         print l1[i1], 'is in both'
...         i1 += 1; i2 += 1
...     elif l1[i1]<l2[i2]:
...         i1 += 1
...     else:
...         i2 += 1
...
2 is in both
6 is in both
7 is in both
9 is in both
Traceback (most recent call last):
   File "<stdin>", line 2, in ?
IndexError: tuple index out of range

Unless I'm terribly confused, this will lead to m+n iterations
in the worst case. (Of course, the sort operations are something
like O(n log n) I guess.)

By the way, you might be able to use sets too: How fast is
Set.intersection?