Performance problem with filtering

Wed Mar 13 23:01:41 EST 2002

    gh> results = []
    gh> for entry in b:
    gh>     if entry not in a:
    gh>         results.append(entry)

    gh> is terribly slow.

Here's one idea using dicts.  Depending on what your lists contain (sounds
like filenames?) you might have to adjust things a bit to make them
dict-worthy.

    adict = {}
    for fname in a: adict[fname] = 1
    results = []
    for fname in b:
        if fname not in adict:
            results.append(fname)

or, pre-2.2:

    adict = {}
    for fname in a: adict[fname] = 1
    results = []
    for fname in b:
        if no adict.has_key(fname):
            results.append(fname)

This at least gets your algorithm out of the O(n**2) realm.  Should run a
lot faster.

-- 
Skip Montanaro (skip at pobox.com - http://www.mojam.com/)