[issue45674] From Python 3.7, sre_parse.parse() do not create SubPattern instances that can be used to back reproduce original expression if containing non-capturing groups
Tristan
report at bugs.python.org
Fri Oct 29 14:35:49 EDT 2021
New submission from Tristan <trislatr at gmail.com>:
>From Python 3.7, sre_parse.parse() do not create SubPattern instances that can be used to back reproduce original expression if containing non-capturing groups.
In Python 3.6:
>>> import sre_parse
>>> sre_parse.parse("(?:foo (?:bar) | (?:baz))").dump()
SUBPATTERN None 0 0
BRANCH
LITERAL 102
LITERAL 111
LITERAL 111
LITERAL 32
SUBPATTERN None 0 0
LITERAL 98
LITERAL 97
LITERAL 114
LITERAL 32
OR
LITERAL 32
SUBPATTERN None 0 0
LITERAL 98
LITERAL 97
LITERAL 122
In Python 3.7 and beyond:
>>> import sre_parse
>>> sre_parse.parse("(?:foo (?:bar) | (?:baz))").dump()
BRANCH
LITERAL 102
LITERAL 111
LITERAL 111
LITERAL 32
LITERAL 98
LITERAL 97
LITERAL 114
LITERAL 32
OR
LITERAL 32
LITERAL 98
LITERAL 97
LITERAL 122
This behaviour is making it impossible to write a correct colorizer for regular expressions using the sre_parse module from Python 3.7. I'm not a regex expert, so I cannot say wether this change has any effect on the matching itself, but if I trust regex101, it will add a capturing group in the place of the non-capturing group.
----------
components: Regular Expressions
messages: 405327
nosy: ezio.melotti, mrabarnett, tristanlatr
priority: normal
severity: normal
status: open
title: From Python 3.7, sre_parse.parse() do not create SubPattern instances that can be used to back reproduce original expression if containing non-capturing groups
type: behavior
versions: Python 3.7
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45674>
_______________________________________
More information about the Python-bugs-list
mailing list