"\b" behaviour at end of string, was RE: Simple (?) Regular Expression Question

Mon Jan 19 04:25:18 EST 2004

Tim Peters wrote:

> [Steve Zatz]
>> Is '@' a special character in regular expressions?
> 
> Nope.
> 
>> I am asking because I don't understand the following:
>>
>> >>> import re
>> >>> s = ' @'
>> >>> re.sub(r'\b@','*',s)
>> ' @'
>> >>> s = ' a'
>> >>> re.sub(r'\ba','*',s)
>> ' *'
> 
> \b matches a "word boundary", meaning it has to have a word character on
> one side (something that matches \w), and a non-word character on the
> other
> (something that matches \W), regardless of order.  ' @' contains two
> non-word characters (' ' and '@'), so \b doesn't match anything in it.  '
> a' contains a non-word character (' ') followed by a word character ('a'),
> so \b matches (an empty string) betwen those two characters.

Playing around with it a bit, I noticed that finditer() runs forever for the
"\b" regular expression:

Python 2.3.3 (#1, Jan  3 2004, 13:57:08)
[GCC 3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> r = re.compile(r"\b")
>>> [m.start() for (i, m) in zip(range(10), r.finditer("alpha"))]
[0, 5, 5, 5, 5, 5, 5, 5, 5, 5]
>>>

Bug or feature?

Peter