Performance problem with filtering

Wed Mar 13 23:43:33 EST 2002

Gerhard =?iso-8859-15?Q?H=E4ring?= <gh_pythonlist at gmx.de> writes:

> I have two lists of files (approx. 50000 entries). Now I want to have all the
> entries of list b, that are not in list a. However, the primitive:
> 
> results = []
> for entry in b:
>     if entry not in a:
>         results.append(entry)
> 
> is terribly slow. I mean *really* slow. Any recommendations on how
> to optimize this? Wouldn't it be nice if I could simply do
> b.removeall(a)?

Method 1: use a dictionary, as several people suggested

Method 2: sort both lists, then make a single sequential pass through
them selecting out the elements that you want, sort of like the "comm"
command in Unix.  Coding this is left for you as an exercise :)