[Tutor] Lists...

Thu, 31 May 2001 09:00:16 +0200

Andrew Wilkins wrote:
> 
> Ahh just what I was hoping not to receive =)
> The thing is, the lists I'm using are terribly big. What I'm doing is
> creating ranges of coordinates,
> so the lists generally will vary from 100 to 1000 elements. I'll try the
> dictionary method! If that's slow, do you think a C implementation would be
> worth the trouble? I've never tried implementing modules in C, could be fun
> =)
> 

Hi Andrew,

Just out of curiosity I did the timing.
I added a intersection method based on the assumption that the lists
are already sorted

Here's the code:

def intersect1(l1, l2):
    '''brute force intersection of two lists'''
    return [item for item in l1 if item in l2]

def intersect2(l1, l2):
    '''intersection of two lists, using dicts'''
    dict = {}
    for item in l2:
        dict[item] = None
    return [item for item in l1 if dict.has_key(item)]

def intersect2a(l1, l2):
    '''using dicts with filter, thanks to Remco Gehrlich'''
    dict = {}
    for item in l2:
        dict[item] = 1
    return filter(dict.has_key, l1)

def intersect3(l1, l2):
    '''intersection of two sorted lists of unique items'''
    result = []
    lo = 0
    hi = len(l2)
    for i in l1:
        for j in xrange(lo, hi):
            n = cmp(l2[j], i)
            if n==-1:
                lo = j
            elif n==1:
                break
            else:
                lo = j
                result.append(i)
                break
    return result

def make_list(n):
    '''Make a sorted list of unique ints larger than 1'''
    import random
    l = []
    x = 1
    for i in xrange(n):
        x = x+random.randrange(1, 5)
        l.append(x)
    return l

def time_func(intersect_func, l1, l2):
    import time
    t1 = time.time()
    l = intersect_func(l1, l2)
    t2 = time.time()-t1
#    print l[0:10]
    return t2

def test(n):
    l1 = make_list(n)
    l2 = make_list(n)
    print 'Brute force (n=%i), t=%f' % (n, time_func(intersect1, l1,
l2))
    print 'Dict lookup (n=%i), t=%f' % (n, time_func(intersect2, l1,
l2))
    print 'Dict filter (n=%i), t=%f' % (n, time_func(intersect2a, l1,
l2))
    print 'Sorted list (n=%i), t=%f' % (n, time_func(intersect3, l1,
l2))

if __name__=='__main__':
    test(10)
    test(100)
    test(1000)

And this is the result:

Brute force (n=10), t=0.000131
Dict lookup (n=10), t=0.000169
Dict filter (n=10), t=0.000146
Sorted list (n=10), t=0.000450
Brute force (n=100), t=0.006470
Dict lookup (n=100), t=0.001560
Dict filter (n=100), t=0.000723
Sorted list (n=100), t=0.003907
Brute force (n=1000), t=0.626828
Dict lookup (n=1000), t=0.013675
Dict filter (n=1000), t=0.007070
Sorted list (n=1000), t=0.041294

As you can see, Remco's version is fastest (staying at less than .1 sec
for two
10k lists). I minor quibble would be that that method doesn't detect a 0
that's in both lists.

intersect3 is the only reasonable thing I could come up with that
doesn't use dicts. It's also scales linearly, but it's 4 times slower
(and took 100 times longer to write, it had a non-obvious bug) than
intersect2. It may be usefull if your list elements are non-hashable
(can't be dictionary keys).

Note that make_list() produces lists that meet all the rather severe
requirements that some of the intersection function require.

You shouldn't expect a C version to improve this by huge factors.

Hope this helps,

Roeland