Regular expression to match whole words.

Tim Peters tim_one at email.msn.com
Wed Sep 27 04:59:05 EDT 2000


[Simon Brunning]
> I'm trying to build a regular expression to match a list of whole words.
>
> If I don't care about whole words, it's easy - I just use:
>
> words = ['Spam', 'egg', 'chips']
> rePattern = '|'.join(map(re.escape, words))
>
> ..and it's fine. The problem with this is that it will match on
> 'smegg' and suchlike. ...

This is what "word-boundary assertions" are for, spelled \b.  Add this line:

    rePattern = r"\b(" + rePattern + r")\b"

Don't leave out rhe "r"s!  A \b is 0-width match that succeeds at the start
or end of a buffer, or between a word (\w) and non-word (\W) character.  If
you leave out the "r"s, each \b will get treated like a backspace control
character instead, and the result probably won't match anything!

don't-shoot-the-messenger<wink>-ly y'rs  - tim






More information about the Python-list mailing list