[Python-bugs-list] [ python-Bugs-725149 ] SRE bugs with capturing groups in negative assertions
SourceForge.net
noreply@sourceforge.net
Sun, 27 Apr 2003 06:28:05 -0700
Bugs item #725149, was opened at 2003-04-21 18:22
Message generated for change (Comment added) made by niemeyer
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=725149&group_id=5470
Category: Regular Expressions
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
>Assigned to: Gustavo Niemeyer (niemeyer)
Summary: SRE bugs with capturing groups in negative assertions
Initial Comment:
SRE is broken in some subtle ways when you combine
capturing groups with assertions. For example:
>>> re.match('((?!(a)c)[ab])*', 'abc').groups()
('b', '')
In the above '(a)' has matched an empty string. Or
worse:
>>> re.match('(a)((?!(b)*))*', 'abb').groups()
('b', None, None)
Here '(a)' matches 'b'.
Although Perl reports matches for groups in negative
assertions, I think it is better to adopt the PCRE rule
that these groups are always reported as unmatched
outside the assertion (inside the assertion, if used with
backreferences, they should behave as normal). This
would make the handling of subpatterns in negative
assertions consistent with that of subpatterns in
branches:
>>> re.match('(a)c|ab', 'ab').groups()
(None,)
In the above, although '(a)' matches before the branch
fails, the failure of the branch means '(a)' is considered
not to have matched.
Anyway, the attached patch is an effort to fix this
problem by saving the values of marks before calling the
assertion, and then restoring them afterwards (thus
undoing whatever might have been done in the assertion).
----------------------------------------------------------------------
>Comment By: Gustavo Niemeyer (niemeyer)
Date: 2003-04-27 13:28
Message:
Logged In: YES
user_id=7887
Greg, I think there are two different issues here.
One of them is related to a wrong behavior from
mark_save/restore(), which don't restore the stackbase
before restoring the marks. Asserts were afected because
they're the only recursive ops that can continue the loop,
but the problem would happen to any operation with the same
behavior. So, rather than hardcoding this into asserts, I
have changed mark_save/restore() to always restore the
stackbase before restoring the marks. This should fix these
two cases you presented:
>>> re.match('(a)(?:(?=(b)*)c)*', 'abb').groups()
>>> re.match('(a)((?!(b)*))*', 'abb').groups()
And was applied as:
Modules/_sre.c: 2.95
Lib/test/test_re.py: 1.41
The other issue is related to the asserts which are leaving
half-marked groups. While your solution does work, it
changes the current behavior, which is also compatible to
how perl works. I understand that individual groups matching
when the whole string doesn't match is atypical. OTOH, a
negative assertion *is* atypical, and IMO denying external
group access won't help the user to understand how it works.
In other words, I think it's better to have incomplete
support for the moment, than having none.
This way, we can think further about this, and look for an
elegant solution to fix that support, certainly including
some algorithm to check for half-marked groups.
Thank you very much for spotting these bugs, and submitting
a solution for them.
----------------------------------------------------------------------
Comment By: Greg Chapman (glchapman)
Date: 2003-04-22 18:46
Message:
Logged In: YES
user_id=86307
In thinking further, I realized that positive assertions are also
affected by the second problem. E.g.:
>>> re.match('(a)(?:(?=(b)*)c)*', 'abb').groups()
('b', None)
The problem here is that a successful match in an assertion
can leave marks at the top of the mark stack which then get
popped in the wrong place. Attaching a new patch which
should catch this problem for both kinds of assertions (and
which also should "unmark" groups in negative assertions).
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=725149&group_id=5470