PEP 3131: Supporting Non-ASCII Identifiers

Mon May 14 10:42:37 EDT 2007

On May 13, 5:44 pm, "Martin v. Löwis" <mar... at v.loewis.de> wrote:

> In summary, this PEP proposes to allow non-ASCII letters as
> identifiers in Python. If the PEP is accepted, the following
> identifiers would also become valid as class, function, or
> variable names: Löffelstiel, changé, ошибка, or 売り場
> (hoping that the latter one means "counter").

I am strongly against this PEP. The serious problems and huge costs
already explained by others are not balanced by the possibility of
using non-butchered identifiers in non-ASCII alphabets, especially
considering that one can write any language, in its full Unicode
glory, in the strings and comments of suitably encoded source files.
The diatribe about cross language understanding of Python code is IMHO
off topic; if one doesn't care about international readers, using
annoying alphabets for identifiers has only a marginal impact. It's
the same situation of IRIs (a bad idea) with HTML text (happily
Unicode).

> - should non-ASCII identifiers be supported? why?
No, they are useless.
> - would you use them if it was possible to do so? in what cases?
No, never.
Being Italian, I'm sometimes tempted to use accented vowels in my
code, but I restrain myself because of the possibility of annoying
foreign readers and the difficulty of convincing every text editor I
use to preserve them

> Python code is written by many people in the world who are not familiar
> with the English language, or even well-acquainted with the Latin
> writing system.  Such developers often desire to define classes and
> functions with names in their native languages, rather than having to
> come up with an (often incorrect) English translation of the concept
> they want to name.

The described set of users includes linguistically intolerant people
who don't accept the use of suitable languages instead of their own,
and of compromised but readable spelling instead of the one they
prefer.
Most "people in the world who are not familiar with the English
language" are much more mature than that, even when they don't write
for international readers.

> The syntax of identifiers in Python will be based on the Unicode
> standard annex UAX-31 [1]_, with elaboration and changes as defined
> below.

Not providing an explicit listing of allowed characters is inexcusable
sloppiness.
The XML standard is an example of how listings of large parts of the
Unicode character set can be provided clearly, exactly and (almost)
concisely.

> ``ID_Start`` is defined as all characters having one of the general
> categories uppercase letters (Lu), lowercase letters (Ll), titlecase
> letters (Lt), modifier letters (Lm), other letters (Lo), letter numbers
> (Nl), plus the underscore (XXX what are "stability extensions" listed in
> UAX 31).
>
> ``ID_Continue`` is defined as all characters in ``ID_Start``, plus
> nonspacing marks (Mn), spacing combining marks (Mc), decimal number
> (Nd), and connector punctuations (Pc).

Am I the first to notice how unsuitable these characters are? Many of
these would be utterly invisible ("variation selectors" are Mn) or
displayed out of sequence (overlays are Mn),  or normalized away
(combining accents are Mn) or absurdly strange and ambiguous (roman
numerals are Nl, for instance).

Lorenzo Gatti