Replace stop words (remove words from a string)
bearophileHUGS at lycos.com
bearophileHUGS at lycos.com
Thu Jan 17 09:37:15 EST 2008
Raymond Hettinger:
> Regular expressions should do the trick.
> >>> stoppattern = '|'.join(map(re.escape, stoplist))
> >>> re.sub(stoppattern, '', mystr)
If the stop words are many (and similar) then that RE can be optimized
with a trie-based strategy, like this one called "List":
http://search.cpan.org/~dankogai/Regexp-Optimizer-0.15/lib/Regexp/List.pm
"List" is used by something more complex called "Optimizer" that's
overkill for the OP problem:
http://search.cpan.org/~dankogai/Regexp-Optimizer-0.15/lib/Regexp/Optimizer.pm
I don't know if a Python module similar to "List" is available, I may
write it :-)
Bye,
bearophile
More information about the Python-list
mailing list