Replace stop words (remove words from a string)

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Thu Jan 17 09:37:15 EST 2008


Raymond Hettinger:
> Regular expressions should do the trick.
> >>> stoppattern = '|'.join(map(re.escape, stoplist))
> >>> re.sub(stoppattern, '', mystr)

If the stop words are many (and similar) then that RE can be optimized
with a trie-based strategy, like this one called "List":
http://search.cpan.org/~dankogai/Regexp-Optimizer-0.15/lib/Regexp/List.pm

"List" is used by something more complex called "Optimizer" that's
overkill for the OP problem:
http://search.cpan.org/~dankogai/Regexp-Optimizer-0.15/lib/Regexp/Optimizer.pm

I don't know if a Python module similar to "List" is available, I may
write it :-)

Bye,
bearophile



More information about the Python-list mailing list