[issue40980] group names of bytes regexes are strings

Quentin Wenger report at bugs.python.org
Tue Jun 16 06:13:21 EDT 2020


Quentin Wenger <wenger.quentin at bluewin.ch> added the comment:

Agreed to some extent, but there is the difference that group names are embedded in the pattern, which has to be bytes if the target is bytes.

My use case is in an all-bytes, no-string project where I construct a large regular expression at startup, with semi-dynamical group names.

So it seems natural to have everything in bytes to concatenate the regular expression, incl. the group names.

But then group names that I receive back are strings, so I cannot look them up directly into the set of group names that I used to create the expression in the first place.

Of course I can live with it by storing them as strings in the first place and encode()'ing them during concatenation, but it does not feel "natural".

Furthermore, even if it is "just a name", a non-ascii group name will raise an error in bytes, even if encoded...:

```
>>> re.compile("(?P<" + "é" + ">)")
re.compile('(?P<é>)')
>>> re.compile(b"(?P<" + "é".encode() + b">)")
Traceback (most recent call last):
  File "<pyshell#9>", line 1, in <module>
    re.compile(b"(?P<" + "é".encode() + b">)")
  File "/usr/lib/python3.8/re.py", line 252, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python3.8/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib/python3.8/sre_parse.py", line 703, in _parse
    raise source.error(msg, len(name) + 1)
re.error: bad character in group name 'é' at position 4
```

So no, it's not really "just a name", considering that in Python "é" should is a valid name.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue40980>
_______________________________________


More information about the Python-bugs-list mailing list