Most efficient solution?

Bernhard Herzog bh at intevation.de
Mon Jul 16 10:39:34 EDT 2001


alf at leo.logilab.fr (Alexandre Fayolle) writes:

> On Mon, 16 Jul 2001 09:19:09 -0400, Jay Parlar <jparlar at home.com> wrote:
> >List B consists of my "stopwords", meaning, the words I don't want included in my final version of list A. So what I need to 
> >do is check every item in list A, and if it occurs in list B, then I want to remove it from the final version of A. My first thought 
> >would be:
> >
> >for eachItem in A:
> >    if eachItem in B:
> >        A.remove(eachItem)
> >
> 
> You may get some speedup by making B a dictionary, and using has_key() to
> see if the word is there. This should get you a O(log(n)) instead of O(n) 
> inside the loop. To gain further performance,  use filter to skim A. 
> 
> C = {}
> for item in B:
>     C[item]=None
> 
> A = filter(lambda e, dic = C: dic.has_key(e), A)

A bit more elegant, perhaps, and a little faster still would be to use 1
as the value in C and directly use C.get in filter:

C = {}
for item in B:
    C[item] = 1

A = filter(C.get, A)

-- 
Intevation GmbH                                 http://intevation.de/
Sketch                                 http://sketch.sourceforge.net/
MapIt!                                               http://mapit.de/



More information about the Python-list mailing list