regexp non-greedy matching bug?

John Hazen john at hazen.net
Sat Dec 3 23:19:31 EST 2005


I want to match one or two instances of a pattern in a string.

According to the docs for the 're' module 
( http://python.org/doc/current/lib/re-syntax.html ) the '?' qualifier
is greedy by default, and adding a '?' after a qualifier makes it
non-greedy.

> The "*", "+", and "?" qualifiers are all greedy...
> Adding "?" after the qualifier makes it perform the match in
> non-greedy or minimal fashion...

In the following example, though my re is intended to allow for 1 or 2
instinces of 'foo', there are 2 in the string I'm matching.  So, I would
expect group(1) and group(3) to both be populated.  (When I remove the
conditional match on the 2nd foo, the grouping is as I expect.)

$ python2.4
Python 2.4.1 (#2, Mar 31 2005, 00:05:10) 
[GCC 3.3 20030304 (Apple Computer, Inc. build 1666)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> foofoo = re.compile(r'^(foo)(.*?)(foo)?(.*?)$')
>>> foofoo.match(s).group(0)
'foobarbazfoobar'
>>> foofoo.match(s).group(1)
'foo'
>>> foofoo.match(s).group(2)
''
>>> foofoo.match(s).group(3)
>>> foofoo.match(s).group(4)
'barbazfoobar'
>>> foofoo = re.compile(r'^(foo)(.*?)(foo)(.*?)$')
>>> foofoo.match(s).group(0)
'foobarbazfoobar'
>>> foofoo.match(s).group(1)
'foo'
>>> foofoo.match(s).group(2)
'barbaz'
>>> foofoo.match(s).group(3)
'foo'
>>> foofoo.match(s).group(4)
'bar'
>>>


So, is this a bug, or just a problem with my understanding?  If it's my
brain that's broken, what's the proper way to do this with regexps?

And, if the above is expected behavior, should I submit a doc bug?  It's
clear that the "?" qualifier (applied to the second foo group) is _not_
greedy in this situation.

-John



More information about the Python-list mailing list