Most efficient solution?
Jeffery D. Collins
jcollins at boulder.net
Mon Jul 16 11:44:21 EDT 2001
On Mon, Jul 16, 2001 at 03:41:21PM +0100, Simon Brunning wrote:
> > From: Simon Brunning
> > From: Jeffery D. Collins [SMTP:jcollins at boulder.net]
> > > List B consists of my "stopwords", meaning, the words I don't want
> > included in my final version of list A. So what I need to
> > > do is check every item in list A, and if it occurs in list B, then
> > I want to remove it from the final version of A. My first thought
> > > would be:
> > >
> > > for eachItem in A:
> > > if eachItem in B:
> > > A.remove(eachItem)
> > >
> >
> > How about:
> >
> > map(A.remove, B)
> >
> > Clever! Unfortunately, it dies (giving a ValueError) if any of the tokens
> > in B are not present in A. Nasty, but how about:
> >
> > map((A + B).remove, B)
>
> That will teach me to post without testing. This doesn't work at all - items
> from B are removed from the temporary, unnamed list created for A + B.
>
> A bit neared the truth would be:
>
> A += B
> map(A.remove, B)
>
> But this doesn't really work, either. Only the first occurrence of each
> token in B is removed.
>
> It strikes me that if B is large, it might be quicker to build a re pattern
> from its elements, and match on that, rather than use 'in'. I'll give that a
> bash later.
>
How about this:
import operator
h = {}
map(operator.setitem, [h]*len(B), B, B) # create a dictionary of B
bb = filter(h.get, A) # find all elements of A in B (include repeats)
map(A.remove, bb)
--
Jeffery Collins (http://www.boulder.net/~jcollins)
More information about the Python-list
mailing list