[Python-bugs-list] [ python-Bugs-725149 ] SRE bugs with capturing groups in negative assertions

SourceForge.net noreply@sourceforge.net
Tue, 22 Apr 2003 11:46:19 -0700


Bugs item #725149, was opened at 2003-04-21 10:22
Message generated for change (Comment added) made by glchapman
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=725149&group_id=5470

Category: Regular Expressions
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Fredrik Lundh (effbot)
Summary: SRE bugs with capturing groups in negative assertions

Initial Comment:
SRE is broken in some subtle ways when you combine 
capturing groups with assertions.  For example:

>>> re.match('((?!(a)c)[ab])*', 'abc').groups()
('b', '')

In the above '(a)' has matched an empty string.  Or 
worse:

>>> re.match('(a)((?!(b)*))*', 'abb').groups()
('b', None, None)

Here '(a)' matches 'b'.

Although Perl reports matches for groups in negative 
assertions, I think it is better to adopt the PCRE rule 
that these groups are always reported as unmatched 
outside the assertion (inside the assertion, if used with 
backreferences, they should behave as normal).  This 
would make the handling of subpatterns in negative 
assertions consistent with that of subpatterns in 
branches:

>>> re.match('(a)c|ab', 'ab').groups()
(None,)

In the above, although '(a)' matches before the branch 
fails, the failure of the branch means '(a)' is considered 
not to have matched.

Anyway, the attached patch is an effort to fix this 
problem by saving the values of marks before calling the 
assertion, and then restoring them afterwards (thus 
undoing whatever might have been done in the assertion).


----------------------------------------------------------------------

>Comment By: Greg Chapman (glchapman)
Date: 2003-04-22 10:46

Message:
Logged In: YES 
user_id=86307

In thinking further, I realized that positive assertions are also 
affected by the second problem.  E.g.:

>>> re.match('(a)(?:(?=(b)*)c)*', 'abb').groups()
('b', None)

The problem here is that a successful match in an assertion 
can leave marks at the top of the mark stack which then get 
popped in the wrong place.  Attaching a new patch which 
should catch this problem for both kinds of assertions (and 
which also should "unmark" groups in negative assertions).


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=725149&group_id=5470