NEWBIE QUESTIION: Comparing Lists in Python

Alex Martelli aleax at aleax.it
Thu May 2 05:28:58 EDT 2002


logistix wrote:

> The most general way would be to just copy the list instead of using a
> dict and remove the elements:
> 
>>>> diff = list1[:]
>>>> for item in diff:
> ...  if item in list2:
> ...   diff.remove(item)
> ...
>>>> diff
> [1, 2, 3]

If the length of both lists are proportional to some number N, this
algorithm will take a time of O(N*N*N) -- *cubic*.  One factor of N
comes from the for statement, one from operator 'in' when applied to
a list on the RHS, and one from the remove operation.


> List comprehensions would be my favorite (although many will disagree):
> 
>>>> list1 = [1,2,3,4,5]
>>>> list2 = [4,5,6,7,8]
>>>> returnList1 = [val for val in list1 if val not in list2]
>>>> returnList2 = [val for val in list2 if val not in list1]

This is O(N*N), since you don't have the N factor from the remove.

The way to get O(N) behavior is to construct auxiliary dictionaries:

dict1 = dict(zip(list1,list1))
returnList2 = [x for x in list2 if x not in dict1]

"x not in C" takes O(N) time if container C is a sequence with N
elements, but (roughly) O(1) time if C is a dictionary.  Building a
dictionary of N items is roughly O(N).


Premature optimization is the root of all evil (Knuth) -- but
that basically means MICRO-optimization, trying to shave 20% or
80% off some algorithm for some fixed big-O behavior.  Getting
the right big-O behavior is generally not a wasted effort, unless
you can guarantee that all containers around will be quite small.

Python generally makes getting sane big-O behavior quite easy,
thanks to the superb performance of dictionaries above all (and,
in other contexts, the great performance of list's sort method,
and a few other typical hotspots).  You do have to know the big
tricks and pitfalls, such as "x in sequence", somelist.remove,
building strings out of pieces with + or += -- these are the
big three typical timesinks of hastily written Python code.


Alex




More information about the Python-list mailing list