Hopefully simple regular expression question

peterbe at gmail.com peterbe at gmail.com
Tue Jun 14 07:01:58 EDT 2005


I want to match a word against a string such that 'peter' is found in
"peter bengtsson" or " hey peter," or but in "thepeter bengtsson" or
"hey peterbe," because the word has to stand on its own. The following
code works for a single word:

def createStandaloneWordRegex(word):
    """ return a regular expression that can find 'peter' only if it's
written
    alone (next to space, start of string, end of string, comma, etc)
but
    not if inside another word like peterbe """
    return re.compile(r"""
      (
      ^ %s
      (?=\W | $)
      |
      (?<=\W)
      %s
      (?=\W | $)
      )
      """% (word, word), re.I|re.L|re.M|re.X)


def test_createStandaloneWordRegex():
    def T(word, text):
        print createStandaloneWordRegex(word).findall(text)

    T("peter", "So Peter Bengtsson wrote this")
    T("peter", "peter")
    T("peter bengtsson", "So Peter Bengtsson wrote this")

The result of running this is::

 ['Peter']
 ['peter']
 []   <--- this is the problem!!


It works if the parameter is just one word (eg. 'peter') but stops
working when it's an expression (eg. 'peter bengtsson')

How do I modify my regular expression to match on expressions as well
as just single words??




More information about the Python-list mailing list