comparing lists

Lulu of the Lotus-Eaters mertz at gnosis.cx
Wed May 8 22:34:20 EDT 2002


|> From: Lulu of the Lotus-Eaters [mailto:mertz at gnosis.cx]
|> ...What you probably want to do is add a delimiter
|> between the elements that you are pretty sure will not occur
                                     ^^^^^^^^^^^
"Delaney, Timothy" <tdelaney at avaya.com> wrote previously:
|Make that 100% certain will not occur in the original strings ...

Nah...  "pretty sure" is good enough.  Specifically, if you use a random
delimiter of length N, and a list of length K, you can be sure with a
confidence of roughly K/(256**N).  N doesn't have to get very large for
pretty sure to be pretty *damn* sure :-).

|I would suggest just sticking with converting to lowercase, sorting, then
|comparing.
|l1 = map(string.lower, l1)    # or l1 = [x.lower() for x in l1]
|l2 = map(string.lower, l2)    # or l2 = [x.lower() for x in l2]
|l1.sort()
|l2.sort()

Well... here's the thing:

    import string
    from time import clock
    from random import randrange

    dSize = 3
    delim = ''.join(map(chr, [randrange(0,256) for _ in range(dSize)]))

    SIZE = 100000
    l1 = ['aaaaa' for x in range(SIZE)]
    l2 = ['aaaaa' for x in range(SIZE)]

    def DelaneyCompare(l1, l2):
        l1 = map(string.lower, l1)
        l2 = map(string.lower, l2)
        l1.sort()
        l2.sort()
        return (l1==l2)

    def LuluCompare(l1, l2):
        l1.sort()
        l2.sort()
        s1 = delim.join(l1).lower()
        s2 = delim.join(l2).lower()
        return (s1==s2)

    if __name__=='__main__':
        start = clock()
        print 'Comparison Result:', DelaneyCompare(l1, l2)
        print 'Time:', clock()-start

        start = clock()
        print 'Comparison Result:', LuluCompare(l1, l2)
        print 'Time:', clock()-start

I like my version about 10x better :-).

Yours, Lulu...

P.S. Yeah...  I realized I cheated.  The lowercasing is so much faster
for the whole string than in a map() across all the little strings.  In
my hasty defense:  if the strings really are insensitive, why not just
store them in a lower() version when they are added to the list in the
first place?

P.P.S. If I stop cheating, I like my version about 0.95x as well as
Tim's :-).

--
mertz@  | The specter of free information is haunting the `Net!  All the
gnosis  | powers of IP- and crypto-tyranny have entered into an unholy
.cx     | alliance...ideas have nothing to lose but their chains.  Unite
        | against "intellectual property" and anti-privacy regimes!
-------------------------------------------------------------------------






More information about the Python-list mailing list