NEWBIE QUESTIION: Comparing Lists in Python
Mark McEahern
marklists at mceahern.com
Thu May 2 09:45:38 EDT 2002
[Alex Martelli]
> The way to get O(N) behavior is to construct auxiliary dictionaries:
>
> dict1 = dict(zip(list1,list1))
> returnList2 = [x for x in list2 if x not in dict1]
>
> "x not in C" takes O(N) time if container C is a sequence with N
> elements, but (roughly) O(1) time if C is a dictionary. Building a
> dictionary of N items is roughly O(N).
Alex, thanks for the zip tip. I don't usually reach for that particular
hammer--this usage broadens my horizons.
I whipped together the program below to time the difference between a raw
compare and a compare with an auxiliary dictionary (in the program I
contrast these two approaches as raw and zip). It seems that with a small
list size (10-100), the performance boost of the zip is negligible?
Here are some results:
Usage: time_not_in.py count share.
count is the size of each list.
share is the number of elements they share.
$ time_not_in.py 100 10
raw: 0.004 seconds.
zip: 0.001 seconds.
$ time_not_in.py 1000 100
raw: 0.394 seconds.
zip: 0.010 seconds.
$ time_not_in.py 10000 1000
raw: 44.949 seconds.
zip: 0.176 seconds.
Cheers,
// mark
#! /usr/bin/env python
import time
def not_in_list_zip(l1, l2):
"""
Return a list of elements from l1 that are not in l2.
"""
d = dict(zip(l2, l2))
return [x for x in l1 if x not in d]
def not_in_list_raw(l1, l2):
"""
Return a list of elements from l1 that are not in l2.
"""
return [x for x in l1 if x not in l2]
def time_raw(l1, l2):
f = not_in_list_raw
return time_it(f, l1, l2)
def time_zip(l1, l2):
f = not_in_list_zip
return time_it(f, l1, l2)
def time_it(f, l1, l2):
start = time.time()
d1 = f(l1, l2)
d2 = f(l2, l1)
end = time.time()
return d1, d2, end - start
def show_time(s, t):
print "%s: %1.3f seconds." % (s, t)
def main():
import os, sys
prog = os.path.basename(sys.argv[0])
usage = "Usage: %s count share." % prog
default_count = 10
default_share = 2
try:
count = int(sys.argv[1])
except IndexError:
count = default_count
try:
share = int(sys.argv[2])
except IndexError:
share = default_share
l1 = range(0, count)
l2 = range(count - share, 2 * count - share)
raw_d1, raw_d2, raw_t = time_raw(l1, l2)
zip_d1, zip_d2, zip_t = time_zip(l1, l2)
show_time("raw", raw_t)
show_time("zip", zip_t)
if __name__ == "__main__":
main()
More information about the Python-list
mailing list