[Python-bugs-list] [ python-Bugs-725149 ] SRE bugs with capturing groups in negative assertions
SourceForge.net
noreply@sourceforge.net
Tue, 22 Apr 2003 11:46:19 -0700
Bugs item #725149, was opened at 2003-04-21 10:22
Message generated for change (Comment added) made by glchapman
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=725149&group_id=5470
Category: Regular Expressions
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Fredrik Lundh (effbot)
Summary: SRE bugs with capturing groups in negative assertions
Initial Comment:
SRE is broken in some subtle ways when you combine
capturing groups with assertions. For example:
>>> re.match('((?!(a)c)[ab])*', 'abc').groups()
('b', '')
In the above '(a)' has matched an empty string. Or
worse:
>>> re.match('(a)((?!(b)*))*', 'abb').groups()
('b', None, None)
Here '(a)' matches 'b'.
Although Perl reports matches for groups in negative
assertions, I think it is better to adopt the PCRE rule
that these groups are always reported as unmatched
outside the assertion (inside the assertion, if used with
backreferences, they should behave as normal). This
would make the handling of subpatterns in negative
assertions consistent with that of subpatterns in
branches:
>>> re.match('(a)c|ab', 'ab').groups()
(None,)
In the above, although '(a)' matches before the branch
fails, the failure of the branch means '(a)' is considered
not to have matched.
Anyway, the attached patch is an effort to fix this
problem by saving the values of marks before calling the
assertion, and then restoring them afterwards (thus
undoing whatever might have been done in the assertion).
----------------------------------------------------------------------
>Comment By: Greg Chapman (glchapman)
Date: 2003-04-22 10:46
Message:
Logged In: YES
user_id=86307
In thinking further, I realized that positive assertions are also
affected by the second problem. E.g.:
>>> re.match('(a)(?:(?=(b)*)c)*', 'abb').groups()
('b', None)
The problem here is that a successful match in an assertion
can leave marks at the top of the mark stack which then get
popped in the wrong place. Attaching a new patch which
should catch this problem for both kinds of assertions (and
which also should "unmark" groups in negative assertions).
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=725149&group_id=5470