[issue35859] Capture behavior depends on the order of an alternation

Ma Lin report at bugs.python.org
Mon Mar 4 04:59:51 EST 2019


Ma Lin <malincns at 163.com> added the comment:

Found another bug in re:

>>> re.match(r'(?:.*?\b(?=(\t)|(x))x)*', 'a\txa\tx').groups()
('\t', 'x')

Expected result: (None, 'x')

PHP 7.3.2           NULL, "x"
Java 11.0.2         "\t", "x"
Perl 5.28.1         "\t", "x"
Ruby 2.6.1          nil, "x"
Go 1.12             doesn't support lookaround
Rust 1.32.0         doesn't support lookaround
Node.js 10.15.1     undefined, "x"
regex 2019.2.21     None, "x"
re                  "\t", "x"

This is a very rare bug, can be fixed by adding MARH_PUSH() before JUMP_MIN_REPEAT_ONE. And maybe other JUMPs should MARK_PUSH() as well.

I'm impressed with regex module, it never went wrong.
IMHO, I would like to see a pruned version be adopted into stdlib.

~~~~~~~~~~~~~~~~~~~~~~
> Interesting sidelights 1
> Found a Perl bug

I reported to Perl, it's a bug in perl-5.26, and already fixed in perl-5.28.0.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35859>
_______________________________________


More information about the Python-bugs-list mailing list