NEWBIE QUESTIION: Comparing Lists in Python

Thu May 2 11:26:23 EDT 2002

Mark McEahern wrote:

> [Alex Martelli]
>> The way to get O(N) behavior is to construct auxiliary dictionaries:
>>
>> dict1 = dict(zip(list1,list1))
>> returnList2 = [x for x in list2 if x not in dict1]
>>
>> "x not in C" takes O(N) time if container C is a sequence with N
>> elements, but (roughly) O(1) time if C is a dictionary.  Building a
>> dictionary of N items is roughly O(N).
> 
> Alex, thanks for the zip tip.  I don't usually reach for that particular
> hammer--this usage broadens my horizons.

You're welcome!

> I whipped together the program below to time the difference between a raw
> compare and a compare with an auxiliary dictionary (in the program I
> contrast these two approaches as raw and zip).  It seems that with a small
> list size (10-100), the performance boost of the zip is negligible?

*yes*!  The performance boost is actually with the *dict* -- the zip is
just part of the O(N) price you _pay_ to get the big-O-boost.  It being
a big-O-boost, it starts showing up for "large enough" lists -- and only
measurement can practically determine what's "large enough" in one
particular case.

> $ time_not_in.py 100 10
> raw: 0.004 seconds.
> zip: 0.001 seconds.

looks like 300% better in this case -- not too bad after all, although
describing it as "negligible" may generally be correct.  Still, if I
could speed up every program by a "negligible" four times I might not
mind too much:-).

> $ time_not_in.py 1000 100
> raw: 0.394 seconds.
> zip: 0.010 seconds.

Here, the speedup of about 40 times (4000% or so) I would not describe
as "negligible" any more.  Matter of taste, I guess.

> $ time_not_in.py 10000 1000
> raw: 44.949 seconds.
> zip: 0.176 seconds.

I'm glad we seem to agree that a speedup by a few hundred times
(tens of thousands percent) is NOT negligible:-).

Alex