Most efficient solution?

Tue Jul 17 14:23:35 EDT 2001

On Mon, 16 Jul 2001 23:19:07 +0100, Gareth.McCaughan at pobox.com (Gareth McCaughan) wrote:

>Jay Parlar wrote:
>[...]
>[He has two lists of words, a long one (A) and a shorter one (B).
>He wants to get the result of removing every instance of each word
>in list B from list A.]
>
>Many people have suggested a dictionary-based solution. Here's
>another approach, which turns out to be slower (for me, anyway).
>
>Build a regular expression which matches a word exactly when
>it isn't in list B. It should look like "(?!....$)"
>where "...." matches exactly the words in list B.
>(This is a "negative lookahead assertion". Actually it's not
>quite right to say that it "matches a word"; it matches a
>zero-length substring when it's followed by something that
>matches "...." and the end of the string!)
>
I had an idea using regular expressions too, but using
it in the original split. E.g.,

 C = ['one', 'two', 'three']
 a = 'apple one banana orange two three pineapple'
 A = a.split() # we don't use this split

 import re
 rx = re.compile((r'\b%s\s+|'*len(C)+r'\s+') % tuple(C))
 >>> re.split(rx,a)
 ['apple', '', 'banana', 'orange', '', '', 'pineapple']

then, you could just filter out the null strings when you
use the result, whatever use that is. Maybe someone will
think of a regex that eliminates the nulls. I've got to
do something else now ;-)

It would be pretty inefficient to do this
 >>> (' '.join(re.split(rx,a))).split()
 ['apple', 'banana', 'orange', 'pineapple']