[issue23263] Python 3 gives misleading errors when validating unicode identifiers

Sun Jan 18 06:44:41 CET 2015

New submission from Matt Bachmann:

PEP 3131 changed the definition of valid identifiers to match this pattern

<XID_Start> <XID_Continue>* .

Currently if you have an invalid character in an identifier you get this error

☺ = 4
SyntaxError: invalid character in identifier

This is fine in most cases. But in some cases the problem is not the character is invalid so much as the character may not be used to START the identifier. One example of this is the "combining grave accent" which is an XID_CONTINUE character but not an XID_START

So ̀e is an invalid identifier but è is a valid identifier. So the ̀ character is not invalid in all cases.

The attached patch attempts to clarify this by providing a different error when the start character is invalid.

>>> ̀e = 4
  File "<stdin>", line 1
    ̀e = 4
     ^
SyntaxError: invalid start character in identifier

However, if the character is simply not allowed (as it is neither an XID_START or an XID_CONTINUE character) the original error is used.
>>> ☺smile = 4
  File "<stdin>", line 1
    ☺smile = 4
         ^
SyntaxError: invalid character in identifier

----------
components: Unicode
files: clarify_unicode_identifier_errors.patch
keywords: patch
messages: 234222
nosy: Matt.Bachmann, ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: Python 3 gives misleading errors when validating unicode identifiers
type: enhancement
versions: Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6
Added file: http://bugs.python.org/file37755/clarify_unicode_identifier_errors.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue23263>
_______________________________________