Performance problem with filtering
Paul Rubin
phr-n2002a at nightsong.com
Wed Mar 13 23:43:33 EST 2002
Gerhard =?iso-8859-15?Q?H=E4ring?= <gh_pythonlist at gmx.de> writes:
> I have two lists of files (approx. 50000 entries). Now I want to have all the
> entries of list b, that are not in list a. However, the primitive:
>
> results = []
> for entry in b:
> if entry not in a:
> results.append(entry)
>
> is terribly slow. I mean *really* slow. Any recommendations on how
> to optimize this? Wouldn't it be nice if I could simply do
> b.removeall(a)?
Method 1: use a dictionary, as several people suggested
Method 2: sort both lists, then make a single sequential pass through
them selecting out the elements that you want, sort of like the "comm"
command in Unix. Coding this is left for you as an exercise :)
More information about the Python-list
mailing list