Regular expression to match whole words.
Tim Peters
tim_one at email.msn.com
Wed Sep 27 04:59:05 EDT 2000
[Simon Brunning]
> I'm trying to build a regular expression to match a list of whole words.
>
> If I don't care about whole words, it's easy - I just use:
>
> words = ['Spam', 'egg', 'chips']
> rePattern = '|'.join(map(re.escape, words))
>
> ..and it's fine. The problem with this is that it will match on
> 'smegg' and suchlike. ...
This is what "word-boundary assertions" are for, spelled \b. Add this line:
rePattern = r"\b(" + rePattern + r")\b"
Don't leave out rhe "r"s! A \b is 0-width match that succeeds at the start
or end of a buffer, or between a word (\w) and non-word (\W) character. If
you leave out the "r"s, each \b will get treated like a backspace control
character instead, and the result probably won't match anything!
don't-shoot-the-messenger<wink>-ly y'rs - tim
More information about the Python-list
mailing list