Performance problem with filtering
Skip Montanaro
skip at pobox.com
Wed Mar 13 23:01:41 EST 2002
gh> results = []
gh> for entry in b:
gh> if entry not in a:
gh> results.append(entry)
gh> is terribly slow.
Here's one idea using dicts. Depending on what your lists contain (sounds
like filenames?) you might have to adjust things a bit to make them
dict-worthy.
adict = {}
for fname in a: adict[fname] = 1
results = []
for fname in b:
if fname not in adict:
results.append(fname)
or, pre-2.2:
adict = {}
for fname in a: adict[fname] = 1
results = []
for fname in b:
if no adict.has_key(fname):
results.append(fname)
This at least gets your algorithm out of the O(n**2) realm. Should run a
lot faster.
--
Skip Montanaro (skip at pobox.com - http://www.mojam.com/)
More information about the Python-list
mailing list