Replace stop words (remove words from a string)
Raymond Hettinger
python at rcn.com
Thu Jan 17 03:49:54 EST 2008
On Jan 17, 12:25 am, BerlinBrown <berlin.br... at gmail.com> wrote:
> if I have an array of "stop" words, and I want to replace those values
> with something else;
> mystr =
> kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsldfsd;
> if I have an array stop_list = [ "[BAD]", "[BAD2]" ]
> I want to replace the values in that list with a zero length string.
Regular expressions should do the trick.
Try this:
>>> mystr = 'kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsldfsd;'
>>> stoplist = ["[BAD]", "[BAD2]"]
>>> import re
>>> stoppattern = '|'.join(map(re.escape, stoplist))
>>> re.sub(stoppattern, '', mystr)
'kljsldkfjksjdfjsdjflkdjslkfKkjkkkkjkkjkLSKJFKSFJKSJF;Lkjsld\xadfsd;'
Raymond
More information about the Python-list
mailing list