[New-bugs-announce] [issue45674] From Python 3.7, sre_parse.parse() do not create SubPattern instances that can be used to back reproduce original expression if containing non-capturing groups

Tristan report at bugs.python.org
Fri Oct 29 14:35:49 EDT 2021


New submission from Tristan <trislatr at gmail.com>:

>From Python 3.7, sre_parse.parse() do not create SubPattern instances that can be used to back reproduce original expression if containing non-capturing groups. 

In Python 3.6: 

>>> import sre_parse
>>> sre_parse.parse("(?:foo (?:bar) | (?:baz))").dump()
SUBPATTERN None 0 0
  BRANCH
    LITERAL 102
    LITERAL 111
    LITERAL 111
    LITERAL 32
    SUBPATTERN None 0 0
      LITERAL 98
      LITERAL 97
      LITERAL 114
    LITERAL 32
  OR
    LITERAL 32
    SUBPATTERN None 0 0
      LITERAL 98
      LITERAL 97
      LITERAL 122


In Python 3.7 and beyond: 

>>> import sre_parse
>>> sre_parse.parse("(?:foo (?:bar) | (?:baz))").dump()
BRANCH
  LITERAL 102
  LITERAL 111
  LITERAL 111
  LITERAL 32
  LITERAL 98
  LITERAL 97
  LITERAL 114
  LITERAL 32
OR
  LITERAL 32
  LITERAL 98
  LITERAL 97
  LITERAL 122

This behaviour is making it impossible to write a correct colorizer for regular expressions using the sre_parse module from Python 3.7. I'm not a regex expert, so I cannot say wether this change has any effect on the matching itself, but if I trust regex101, it will add a capturing group in the place of the non-capturing group.

----------
components: Regular Expressions
messages: 405327
nosy: ezio.melotti, mrabarnett, tristanlatr
priority: normal
severity: normal
status: open
title: From Python 3.7, sre_parse.parse() do not create SubPattern instances that can be used to back reproduce original expression if containing non-capturing groups
type: behavior
versions: Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45674>
_______________________________________


More information about the New-bugs-announce mailing list