Performance problem with filtering
Gerhard Häring
gh_pythonlist at gmx.de
Wed Mar 13 23:08:38 EST 2002
Le 14/03/02 ? 14:54, Delaney, Timothy écrivit:
> > From: Gerhard Häring [mailto:gh_pythonlist at gmx.de]
> >
> > I have two lists of files (approx. 50000 entries). Now I want
> > to have all the
> > entries of list b, that are not in list a. However, the primitive:
> >
> > results = []
> > for entry in b:
> > if entry not in a:
> > results.append(entry)
> >
> > is terribly slow. I mean *really* slow. Any recommendations
> > on how to optimize
> > this? Wouldn't it be nice if I could simply do b.removeall(a)?
>
> Firstly, you are doing 'entry in a' for each entry in b. This is a linear
> search through a ... *very* slow. [...]
Ok, so the __contains__ method of lists does a linear search. I should have
known.
Of course I did think about just using a dictionary in the first place, it just
seemed inappropriate, because I didn't really need any mapping of keys to
values for this application.
Anybody wrote a subclassed list yet that is optimized for __contains__? Btw.
this is different from a dictionary, as a list can contain the same value
twice.
Gerhard
--
This sig powered by Python!
Außentemperatur in München: 7.0 °C Wind: 1.4 m/s
More information about the Python-list
mailing list