[New-bugs-announce] [issue37367] octal escapes applied inconsistently throughout the interpreter and lib

Fri Jun 21 15:46:08 EDT 2019

New submission from Dan Snider <mr.assume.away at gmail.com>:

At present, the bytecode compiler can generate 512 different unicode characters, one for each integral from the range [0-511), 512 being the total number of syntactically valid permutations of 3 octal digits preceded by a backslash. However, this does not match the regex compiler, which raises an error regardless of the input type when it encounters an an octal escape character with a decimal value greater than 255. On the other hand... the bytes literal:

>>> b'\407'

is somehow valid, and can lead to extremely difficult bugs to track down, such as this nonsense:

>>> re.compile(b'\407').search(b'\a')
<re.Match object; span=(0, 1), match=b'\x07'>

I propose that the regex parser be augmented, enabling for unicode patterns the interpretation of three character octal escapes from the range(256, 512), while the bytecode parser be adjusted to match the behavior of the regex parser, raising an error for bytes literals > b"\400", rather than truncating the 9th bit.

----------
messages: 346246
nosy: bup
priority: normal
severity: normal
status: open
title: octal escapes applied inconsistently throughout the interpreter and lib

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue37367>
_______________________________________