PEP 3131: Supporting Non-ASCII Identifiers

Josiah Carlson josiah.carlson at sbcglobal.net
Sun May 13 15:58:27 EDT 2007


Stefan Behnel wrote:
> Anton Vredegoor wrote:
>>> In summary, this PEP proposes to allow non-ASCII letters as
>>> identifiers in Python. If the PEP is accepted, the following
>>> identifiers would also become valid as class, function, or
>>> variable names: Löffelstiel, changé, ошибка, or 売り場
>>> (hoping that the latter one means "counter").
>> I am against this PEP for the following reasons:
>>
>> It will split up the Python user community into different language or
>> interest groups without having any benefit as to making the language
>> more expressive in an algorithmic way.
> 
> We must distinguish between "identifiers named in a non-english language" and
> "identifiers written with non-ASCII characters".
[snip]
> I do not think non-ASCII characters make this 'problem' any worse. So I must
> ask people to restrict their comments to the actual problem that this PEP is
> trying to solve.

Really?  Because when I am reading source code, even if a particular 
variable *name* is a sequence of characters that I cannot identify as a 
word that I know, I can at least spell it out using Latin characters, or 
perhaps even attempt to pronounce it (verbalization of a word, even if 
it is an incorrect verbalization, I find helps me to remember a variable 
and use it later).

On the other hand, the introduction of some 60k+ valid unicode glyphs 
into the set of characters that can be seen as a name in Python would 
make any such attempts by anyone who is not a native speaker (and even 
native speakers in the case of the more obscure Kanji glyphs) an 
exercise in futility.

As it stands, people who use Python (and the vast majority of other 
programming languages) learn the 52 upper/lowercase variants of the 
latin alphabet (and sometimes the 0-9 number characters for some parts 
of the world).  That's it.  62 glyphs at the worst.  But a huge portion 
of these people have already been exposed to these characters through 
school, the internet, etc., and this isn't likely to change (regardless 
of the 'impending' Chinese population dominance on the internet).

Indeed, the lack of the 60k+ glyphs as valid name characters can make 
the teaching of Python to groups of people that haven't been exposed to 
the Latin alphabet more difficult, but those people who are exposed to 
programming are also typically exposed to the internet, on which Latin 
alphabets dominate (never mind that html tags are Latin characters, as 
are just about every daemon configuration file, etc.).  Exposure to the 
Latin alphabet isn't going to go away, and Python is very unlikely to be 
the first exposure programmers have to the Latin alphabet (except for 
OLPC, but this PEP is about a year late to the game to change that). 
And even if Python *is* the first time children or adults are exposed to 
the Latin alphabet, one would hope that 62 characters to learn to 'speak 
the language of Python' is a small price to pay to use it.

Regarding different characters sharing the same glyphs, it is a problem. 
  Say that you are importing a module written by a mathematician that 
uses an actual capital Greek alpha for a name.  When a user sits down to 
use it, they could certainly get NameErrors, AttributeErrors, etc., and 
never understand why it is the case.  Their fancy-schmancy unicode 
enabled terminal will show them what looks like the Latin A, but it will 
in fact be the Greek Α.  Until they copy/paste, check its ord(), etc., 
they will be baffled.  It isn't a problem now because A = Α is a syntax 
error, but it can and will become a problem if it is allowed to.

But this issue isn't limited to different characters sharing glyphs! 
It's also about being able to type names to use them in your own code 
(generally very difficult if not impossible for many non-Latin 
characters), or even be able to display them.  And no number of 
guidelines, suggestions, etc., against distributing libraries with 
non-Latin identifiers will stop it from happening, and *will* fragment 
the community as Anton (and others) have stated.

  - Josiah



More information about the Python-list mailing list