[Python-ideas] Python octal escape character encoding "wats"

Chris Angelico rosuav at gmail.com
Fri Nov 9 23:39:36 EST 2018


On Sat, Nov 10, 2018 at 3:19 PM Steven D'Aprano <steve at pearwood.info> wrote:
>
> On Sat, Nov 10, 2018 at 12:56:07PM +1100, Chris Angelico wrote:
>
> > Not ambiguous. It takes as many valid octal digits as it can.
>
> What is the rationale for that? Hex escapes don't.

Irrelevant to whether it's ambiguous or not.

> > "Up to" means that one or two digits can also define a character. For
> > obvious reasons, it has to take digits greedily (otherwise "\777"
> > would be "\x07" followed by "77"), and it's not an error to have fewer
> > digits.
>
> In hindsight, I think we should have insisted that octal escapes must
> always be three digits, just as hex escapes are always two. The status
> quo has too much magical "Do What I Mean" in it for my liking:
>
> py> '\509\51'  # pair of brackets surrounding a nine
> '(9)'
> py> '\507\51'  # pair of brackets surrounding a seven
> 'G)'
>
> Dammit Python, that's not what I meant!

How often do you actually do that with octal escapes, though? Ever had
actual real-world situations where this comes up? I don't recall
*ever* coming across a problem where sometimes I have an octal escape
followed by a nine, and other times by a different digit. I also do
not recall often wanting an octal escape followed by a digit, even
without that confusion.

> > > what do you say
> > > of deprecating any r"\[0-9]{1,3}" sequence that don't match full 3
> > > octal digits, and yield a syntax error for that from Python 3.9 (or
> > > 3.10) on?
> >
> > Nope. Would break code for no good reason.
>
> There's a good reason: to make the behaviour more sensible and less
> confusing and have fewer "oops, that's not what I wanted" bugs. But we
> should have made that change for 3.0. Now, I agree: it would be breakage
> where the benefit doesn't outweigh the cost.

We can debate whether it would be, in the abstract, better to mandate
exactly three digits, or to allow fewer. But I think we're all agreed
that it is nowhere _near_ enough of a problem to justify the breakage.
I perhaps exaggerated slightly in saying "no" good reason, but
certainly not enough to consider the change.

> Maybe in Python 5000.
>
> In the meantime, one or two digit octal escapes ought to be a linter
> warning.

Maybe. Or just have the editor colour the octal escape differently;
that way, the end of the colour will tell you if the language is
misinterpreting your intentions. Either way, yeah, something that
tooling can help with.

ChrisA


More information about the Python-ideas mailing list