[Tutor] Why doesn't this regex match???

Sheila King sheila@thinkspot.net
Fri, 08 Feb 2002 21:30:15 -0800


OK, I'm having some trouble with using the re module for regular
expression matching. (I'm very new to using regular expressions, so I
suppose I could be doing something really stupid?)

Here is a session with the interactive interpreter:

Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help
>>> import re
>>> searchstring = 'ADV: FREE FREE OFFERZ!!!!'
>>> pattern = 'adv:'
>>> p = re.compile(r'\b%s\b' % pattern)
>>> result = p.search(searchstring, re.IGNORECASE)
>>> result
>>> print result
None

I would have expected to get a match on the above situation.

Now when I try this:

>>> searchstring = 'Viagra without a prescription!'
>>> pattern = 'viagra'
>>> p = re.compile(r'\b%s\b' % pattern)
>>> result = p.search(searchstring, re.IGNORECASE)
>>> result
>>> print result
None
>>> searchstring = 'get viagra without a prescription!'
>>> pattern = 'viagra'
>>> p = re.compile(r'\b%s\b' % pattern)
>>> result = p.search(searchstring, re.IGNORECASE)
>>> result
<_sre.SRE_Match object at 0x00AF4010>
>>> 

If 'viagra' comes at the beginning, it doesn't match, but if it comes in
the middle it does. So, one starts to think that \b, the word boundary,
won't match at the beginning of a string (which is totally contrary to
what I would expect).

Please help.

The task I am trying to accomplish right now is this:

I have a list of strings (common words and phrases one might expect to
find in a Spam email, if that wasn't obvious from the above examples)
and I want to do a regular expression search against the subject of an
email and see if I get a match or not (after which I handle the email).

-- 
Sheila King
http://www.thinkspot.net/sheila/

"When introducing your puppy to an adult cat,
restrain the puppy, not the cat." -- Gwen Bailey,
_The Perfect Puppy: How to Raise a Well-behaved Dog_