[Tutor] Lists...
Roeland Rengelink
r.b.rigilink@chello.nl
Thu, 31 May 2001 09:00:16 +0200
Andrew Wilkins wrote:
>
> Ahh just what I was hoping not to receive =)
> The thing is, the lists I'm using are terribly big. What I'm doing is
> creating ranges of coordinates,
> so the lists generally will vary from 100 to 1000 elements. I'll try the
> dictionary method! If that's slow, do you think a C implementation would be
> worth the trouble? I've never tried implementing modules in C, could be fun
> =)
>
Hi Andrew,
Just out of curiosity I did the timing.
I added a intersection method based on the assumption that the lists
are already sorted
Here's the code:
def intersect1(l1, l2):
'''brute force intersection of two lists'''
return [item for item in l1 if item in l2]
def intersect2(l1, l2):
'''intersection of two lists, using dicts'''
dict = {}
for item in l2:
dict[item] = None
return [item for item in l1 if dict.has_key(item)]
def intersect2a(l1, l2):
'''using dicts with filter, thanks to Remco Gehrlich'''
dict = {}
for item in l2:
dict[item] = 1
return filter(dict.has_key, l1)
def intersect3(l1, l2):
'''intersection of two sorted lists of unique items'''
result = []
lo = 0
hi = len(l2)
for i in l1:
for j in xrange(lo, hi):
n = cmp(l2[j], i)
if n==-1:
lo = j
elif n==1:
break
else:
lo = j
result.append(i)
break
return result
def make_list(n):
'''Make a sorted list of unique ints larger than 1'''
import random
l = []
x = 1
for i in xrange(n):
x = x+random.randrange(1, 5)
l.append(x)
return l
def time_func(intersect_func, l1, l2):
import time
t1 = time.time()
l = intersect_func(l1, l2)
t2 = time.time()-t1
# print l[0:10]
return t2
def test(n):
l1 = make_list(n)
l2 = make_list(n)
print 'Brute force (n=%i), t=%f' % (n, time_func(intersect1, l1,
l2))
print 'Dict lookup (n=%i), t=%f' % (n, time_func(intersect2, l1,
l2))
print 'Dict filter (n=%i), t=%f' % (n, time_func(intersect2a, l1,
l2))
print 'Sorted list (n=%i), t=%f' % (n, time_func(intersect3, l1,
l2))
if __name__=='__main__':
test(10)
test(100)
test(1000)
And this is the result:
Brute force (n=10), t=0.000131
Dict lookup (n=10), t=0.000169
Dict filter (n=10), t=0.000146
Sorted list (n=10), t=0.000450
Brute force (n=100), t=0.006470
Dict lookup (n=100), t=0.001560
Dict filter (n=100), t=0.000723
Sorted list (n=100), t=0.003907
Brute force (n=1000), t=0.626828
Dict lookup (n=1000), t=0.013675
Dict filter (n=1000), t=0.007070
Sorted list (n=1000), t=0.041294
As you can see, Remco's version is fastest (staying at less than .1 sec
for two
10k lists). I minor quibble would be that that method doesn't detect a 0
that's in both lists.
intersect3 is the only reasonable thing I could come up with that
doesn't use dicts. It's also scales linearly, but it's 4 times slower
(and took 100 times longer to write, it had a non-obvious bug) than
intersect2. It may be usefull if your list elements are non-hashable
(can't be dictionary keys).
Note that make_list() produces lists that meet all the rather severe
requirements that some of the intersection function require.
You shouldn't expect a C version to improve this by huge factors.
Hope this helps,
Roeland