[Python-ideas] Error handling for unknown Unicode characters (was Re: allow `lambda' to be spelled λ)

Nick Coghlan ncoghlan at gmail.com
Thu Jul 21 07:55:57 EDT 2016


On 21 July 2016 at 17:41, Nick Coghlan <ncoghlan at gmail.com> wrote:
> - the caret positioning logic for syntax errors needs to be checked to
> see if it's currently counting encoded UTF-8 bytes instead of code
> points (as that will consistently do the wrong thing on a correctly
> configured UTF-8 terminal)

Prompted by Chris Angelico, I took a closer a look at the behaviour
here, and it seems to be due to a problem with the caret being
positioned at the end of a candidate "identifier" token, rather than
at the beginning:

>>> varname = “d“a”t”apoint
  File "<stdin>", line 1
    varname = “d“a”t”apoint
                          ^
SyntaxError: invalid character in identifier
>>> varname = “d“a”t”apoint.evidence
  File "<stdin>", line 1
    varname = “d“a”t”apoint.evidence
                          ^
SyntaxError: invalid character in identifier
>>> varname = “d“a”t”apoint[evidence]
  File "<stdin>", line 1
    varname = “d“a”t”apoint[evidence]
                          ^
SyntaxError: invalid character in identifier
>>> varname = “d“a”t”apoint(evidence)
  File "<stdin>", line 1
    varname = “d“a”t”apoint(evidence)
                          ^
SyntaxError: invalid character in identifier

If you view those examples in a fixed width font, you'll see the caret
is pointing at the "t" in each case, rather than at the first
problematic code point. (Even in a proportional font, while you can't
see the actual alignment, you *can* see that the alignment isn't
right)

By contrast, if you put an impermissible ASCII character into the
"identifier" the caret points right at it.

If anyone's inclined to dig into the compilation toolchain to try to
figure out what's going on, I filed on issue for this particular
misbehaviour at http://bugs.python.org/issue27582

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list