sre \Z bug or feature?

Tim Peters tim.one at home.com
Tue Jan 2 12:30:58 EST 2001


[posted & mailed]

[Pearu Peterson]
> >>> re.match(r'(?ms).*?}\s*\Z(?P<rest>.*)','{}\012}\012').groupdict()
> {'rest': ''}
>
> but in Python 2.0 the same match gives
>
> >>> re.match(r'(?ms).*?}\s*\Z(?P<rest>.*)','{}\012}\012').groupdict()
> {'rest': '\012}\012'}
>
> which is surprising because according to docs:
>
> \Z  Matches only at the end of the string.
> ...

You may want to add this test case to the bug "New re breaks on some '*?'
matches" opened yesterday:

http://sourceforge.net/bugs/?func=detailbug&bug_id=127259&group_id=5470

> It seems that the `{}' part is somehow responsible to this
> behaviour because
>
> >>> re.match(r'(?ms).*?}\s*\Z(?P<rest>.*)','aa\012}\012').groupdict()
> {'rest': ''}
>
> is expected result and is obtained with Python 2.0.

If you try the test strings:

'}\012\012'
'}}\012\012'
'}}}\012\012'
'}}}}\012\012'
'}}}}}\012\012'

etc you'll find that the regexp works as expected if there are an odd number
of "}" characters, but doesn't match at all if there are an even number.
That aspect is very much like the current bug report.

> ... any hints how to go around this bug without updating Python
> from CVS after it is fixed?

It's unclear what you're trying to accomplish.  Tell us in words what it is
you're trying to match, and I'm sure we can find an equivalent regexp that
doesn't use *?.  As is, your regexp should match any string whatsoever that
ends with a } followed by optional whitespace, and set groupdict('rest') to
an empty string.  It can never set 'rest' to anything other than an empty
string.  If that's really what you intended, then

    r'(?s).*}\s*\Z(?P<rest>)'

is a simpler and faster way to accomplish that.  But I doubt that's what you
intended.

minimal-matches-backtrack-too-except-they-grow-longer-and-longer-
   ly y'rs  - tim





More information about the Python-list mailing list