comparing lists

Wed May 8 23:46:00 EDT 2002

Here's a much simpler version that demonstrates the problem ...

--

import string
from time import clock
from random import randrange

dSize = 3
delim = ''.join(map(chr, [randrange(0,256) for _ in range(dSize)]))

def DelaneyCompare(l1, l2):
    l1 = map(string.lower, l1)
    l2 = map(string.lower, l2)
    l1.sort()
    l2.sort()
    return l1 == l2

def LuluCompare(l1, l2):
    l1.sort()
    l2.sort()
    s1 = delim.join(l1).lower()
    s2 = delim.join(l2).lower()
    return s1 == s2

if __name__=='__main__':

    l1 = ['a', 'B']
    l2 = ['A', 'b']

    start = clock()
    print 'Delaney Result:', DelaneyCompare(l1, l2)
    print 'Time:', clock()-start

    start = clock()
    print 'Lulu Result:   ', LuluCompare(l1, l2)
    print 'Time:', clock()-start

---------- Run ----------
Delaney Result: 1
Time: 0.000247198842009
Lulu Result:    0
Time: 9.83796172198e-005

In addition, the results seen with LuluCompare were heavily influenced by
the fact that by having

SIZE = 100000
l1 = ['aaaaa' for x in range(SIZE)]
l2 = ['aaaaa' for x in range(SIZE)]

you end up getting 200000 references to exactly the same object (the
interned string 'aaaaa'). This will greatly speed up any sort routine
(comparison of identity only required). DelaneyCompare OTOH was generating
an additional 200000 different string objects, then comparing those.

Always make sure your test data accurately reflects reality.

Tim Delaney