[issue40980] group names of bytes regexes are strings

Quentin Wenger report at bugs.python.org
Tue Jun 16 15:41:48 EDT 2020


Quentin Wenger <wenger.quentin at bluewin.ch> added the comment:

And there's no need for a cryptic encoding like cp1250 for this problem to arise. Here is a simple example with Python's default encoding utf-8:

```
>>> a = "ú"
>>> b = list(re.match(b"(?P<" + a.encode() + b">)", b"").groupdict())[0]
>>> a.isidentifier()
True
>>> b.isidentifier()
True
>>> b
'ú'
>>> a.encode() == b.encode("latin1")
True
```

For reference, here is the very source of the issue: https://github.com/python/cpython/blob/master/Lib/sre_parse.py#L228

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue40980>
_______________________________________


More information about the Python-bugs-list mailing list