Performance problem with filtering

Wed Mar 13 23:08:38 EST 2002

Le 14/03/02 ? 14:54, Delaney, Timothy écrivit:
> > From: Gerhard Häring [mailto:gh_pythonlist at gmx.de]
> > 
> > I have two lists of files (approx. 50000 entries). Now I want 
> > to have all the
> > entries of list b, that are not in list a. However, the primitive:
> > 
> > results = []
> > for entry in b:
> >     if entry not in a:
> >         results.append(entry)
> > 
> > is terribly slow. I mean *really* slow. Any recommendations 
> > on how to optimize
> > this? Wouldn't it be nice if I could simply do b.removeall(a)?
> 
> Firstly, you are doing 'entry in a' for each entry in b. This is a linear
> search through a ... *very* slow. [...]

Ok, so the __contains__ method of lists does a linear search. I should have
known.

Of course I did think about just using a dictionary in the first place, it just
seemed inappropriate, because I didn't really need any mapping of keys to
values for this application.

Anybody wrote a subclassed list yet that is optimized for __contains__? Btw.
this is different from a dictionary, as a list can contain the same value
twice.

Gerhard 
-- 
This sig powered by Python!
Außentemperatur in München: 7.0 °C      Wind: 1.4 m/s