[Python-3000] Unicode identifiers (Was: sets in P3K?)

Sat Apr 29 00:04:31 CEST 2006

Guido van Rossum wrote:
>> I was hoping to propose a PEP on non-ASCII identifiers some
>> day; that would (of course) include a requirement that the
>> standard library would always be restricted to ASCII-only
>> identifiers as a style-guide.
> 
> IMO communication about code becomes much more cumbersome if there are
> non-ASCII letters in identifiers, and the rules about what's a letter,
> what's a digit, and what separates two identifiers become murky.

It depends on the language you use to communicate. In English,
it is certainly cumbersome to talk about Chinese identifiers.
OTOH, I believe it is cumbersome to communicate about English
identifiers in Chinese, either, because the speakers might
not even know what the natural-language concept behind the
identifiers is, and because they can't pronounce the identifier.

As for lexical aspects: these are really straight-forward.
In principal, it would be possible to allow any non-ASCII
character as part of an identifier: all punctuation is ASCII,
so anything non-ASCII can't possibly be punctuation for the
language. However, that much freedom would be confusing;
the Unicode consortium has established rules of what characters
should be allowed in identifiers, and these rules intend to
match the intuition of the users of these characters.

The distinction of letters and digits is also straight-forward:
a digit is ASCII [0-9]; it's a separate lexical class only
because it plays a special role in (number) literals. More
generally, there is the distinction of starter and non-starter
characters.

An identifier ends when the first non-identifier character
is encountered (although I don't think there are many places
in Python where you can have two identifiers immediately following
each other).

Regards,
Martin