Replace stop words (remove words from a string)

Raymond Hettinger python at rcn.com
Thu Jan 17 03:49:54 EST 2008


On Jan 17, 12:25 am, BerlinBrown <berlin.br... at gmail.com> wrote:
> if I have an array of "stop" words, and I want to replace those values
> with something else;
> mystr =
> kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsld­fsd;
> if I have an array stop_list = [ "[BAD]", "[BAD2]" ]
> I want to replace the values in that list with a zero length string.

Regular expressions should do the trick.

Try this:

>>> mystr = 'kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsld­fsd;'
>>> stoplist = ["[BAD]", "[BAD2]"]
>>> import re
>>> stoppattern = '|'.join(map(re.escape, stoplist))
>>> re.sub(stoppattern, '', mystr)
'kljsldkfjksjdfjsdjflkdjslkfKkjkkkkjkkjkLSKJFKSFJKSJF;Lkjsld\xadfsd;'

Raymond



More information about the Python-list mailing list