[issue40980] group names of bytes regexes are strings

Quentin Wenger report at bugs.python.org
Tue Jun 16 10:34:12 EDT 2020


Quentin Wenger <wenger.quentin at bluewin.ch> added the comment:

I prove my point that the decoding to string is arbitrary:

```
>>> import re
>>> orig_name = "Ř"
>>> orig_ch = orig_name.encode("cp1250") # Because why not?
>>> name = list(re.match(b"(?P<" + orig_ch + b">)", b"").groupdict().keys())[0]
>>> name == orig_name
False
>>> name
'Ø'
>>> name.encode("latin-1") == orig_ch
True
```

For any dynamically-constructed bytes regex pattern, a string group name as output is unusable. Only after latin-1-reencoding can it be safely compared. This latin-1 choice is arbitrary.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue40980>
_______________________________________


More information about the Python-bugs-list mailing list