Benefits of unicode identifiers (was: Allow additional separator in identifiers)

Thu Nov 23 19:57:03 EST 2017

On Thu, Nov 23, 2017 at 2:19 PM, Richard Damon <Richard at damon-family.org> wrote:
> The Unicode Standard provides a fairly good classification of the
> characters, and it would make sense to define that an character that is
> defined as a 'Letter' or a 'Number', and some classes of Punctuation
> (connector and dash) be allowed in identifiers.
>
> Fully implementing may be more complicated than it is worth. An interim
> simple solution would be just allow ALL (or maybe most, excluding a limited
> number of obvious exceptions) of the characters above the ASCII set, with a
> warning that only those classified as above are promised to remain valid,
> and that other characters, while currently not generating a syntax error,
> may do so in the future. It should also be stated that while currently no
> character normalization is being done, it may be added in the future, so
> identifiers that differ only by code point sequences that are defined as
> being equivalent, might in the future not be distinct.

It's already implemented; nothing needs to be done. Unicode Standard
Annex #31 defines a recommended syntax of identifiers, which Python
basically follows, except that for backward compatibility Python also
allows identifiers to begin with an underscore. Compare the
recommended syntax at
http://unicode.org/reports/tr31/#Default_Identifier_Syntax with the
Python syntax at
https://docs.python.org/3/reference/lexical_analysis.html#identifiers.