Most efficient solution?

Mon Jul 16 11:44:21 EDT 2001

On Mon, Jul 16, 2001 at 03:41:21PM +0100, Simon Brunning wrote:
> > From:	Simon Brunning 
> > 	From:	Jeffery D. Collins [SMTP:jcollins at boulder.net]
> > 	> List B consists of my "stopwords", meaning, the words I don't want
> > included in my final version of list A. So what I need to 
> > 	> do is check every item in list A, and if it occurs in list B, then
> > I want to remove it from the final version of A. My first thought 
> > 	> would be:
> > 	> 
> > 	> for eachItem in A:
> > 	>     if eachItem in B:
> > 	>         A.remove(eachItem)
> > 	> 
> > 
> > 	How about: 
> > 
> > 	map(A.remove, B)
> >  
> > Clever! Unfortunately, it dies (giving a ValueError) if any of the tokens
> > in B are not present in A. Nasty, but how about:
> > 
> > map((A + B).remove, B)
>  
> That will teach me to post without testing. This doesn't work at all - items
> from B are removed from the temporary, unnamed list created for A + B.
> 
> A bit neared the truth would be:
> 
> A += B
> map(A.remove, B)
> 
> But this doesn't really work, either. Only the first occurrence of each
> token in B is removed.
> 
> It strikes me that if B is large, it might be quicker to build a re pattern
> from its elements, and match on that, rather than use 'in'. I'll give that a
> bash later.
> 

How about this:

import operator
h = {}
map(operator.setitem, [h]*len(B), B, B)  # create a dictionary of B
bb = filter(h.get, A)  # find all elements of A in B (include repeats)
map(A.remove, bb)

-- 
Jeffery Collins (http://www.boulder.net/~jcollins)