comparing lists
Lulu of the Lotus-Eaters
mertz at gnosis.cx
Wed May 8 22:34:20 EDT 2002
|> From: Lulu of the Lotus-Eaters [mailto:mertz at gnosis.cx]
|> ...What you probably want to do is add a delimiter
|> between the elements that you are pretty sure will not occur
^^^^^^^^^^^
"Delaney, Timothy" <tdelaney at avaya.com> wrote previously:
|Make that 100% certain will not occur in the original strings ...
Nah... "pretty sure" is good enough. Specifically, if you use a random
delimiter of length N, and a list of length K, you can be sure with a
confidence of roughly K/(256**N). N doesn't have to get very large for
pretty sure to be pretty *damn* sure :-).
|I would suggest just sticking with converting to lowercase, sorting, then
|comparing.
|l1 = map(string.lower, l1) # or l1 = [x.lower() for x in l1]
|l2 = map(string.lower, l2) # or l2 = [x.lower() for x in l2]
|l1.sort()
|l2.sort()
Well... here's the thing:
import string
from time import clock
from random import randrange
dSize = 3
delim = ''.join(map(chr, [randrange(0,256) for _ in range(dSize)]))
SIZE = 100000
l1 = ['aaaaa' for x in range(SIZE)]
l2 = ['aaaaa' for x in range(SIZE)]
def DelaneyCompare(l1, l2):
l1 = map(string.lower, l1)
l2 = map(string.lower, l2)
l1.sort()
l2.sort()
return (l1==l2)
def LuluCompare(l1, l2):
l1.sort()
l2.sort()
s1 = delim.join(l1).lower()
s2 = delim.join(l2).lower()
return (s1==s2)
if __name__=='__main__':
start = clock()
print 'Comparison Result:', DelaneyCompare(l1, l2)
print 'Time:', clock()-start
start = clock()
print 'Comparison Result:', LuluCompare(l1, l2)
print 'Time:', clock()-start
I like my version about 10x better :-).
Yours, Lulu...
P.S. Yeah... I realized I cheated. The lowercasing is so much faster
for the whole string than in a map() across all the little strings. In
my hasty defense: if the strings really are insensitive, why not just
store them in a lower() version when they are added to the list in the
first place?
P.P.S. If I stop cheating, I like my version about 0.95x as well as
Tim's :-).
--
mertz@ | The specter of free information is haunting the `Net! All the
gnosis | powers of IP- and crypto-tyranny have entered into an unholy
.cx | alliance...ideas have nothing to lose but their chains. Unite
| against "intellectual property" and anti-privacy regimes!
-------------------------------------------------------------------------
More information about the Python-list
mailing list