PEP 3131: Supporting Non-ASCII Identifiers

Mon May 14 07:45:12 EDT 2007

Martin v. Löwis a écrit :
> PEP 1 specifies that PEP authors need to collect feedback from the
> community. As the author of PEP 3131, I'd like to encourage comments
> to the PEP included below, either here (comp.lang.python), or to
> python-3000 at python.org
> 
> In summary, this PEP proposes to allow non-ASCII letters as
> identifiers in Python. If the PEP is accepted, the following
> identifiers would also become valid as class, function, or
> variable names: Löffelstiel, changé, ошибка, or 売り場
> (hoping that the latter one means "counter").
> 
> I believe this PEP differs from other Py3k PEPs in that it really
> requires feedback from people with different cultural background
> to evaluate it fully - most other PEPs are culture-neutral.
> 
> So, please provide feedback, e.g. perhaps by answering these
> questions:
> - should non-ASCII identifiers be supported? why?
> - would you use them if it was possible to do so? in what cases?

I strongly prefer to stay with current standard limited ascii for 
identifiers.

Ideally, it would be agreable to have variables like greek letters for 
some scientific vars, for french people using éèçà in names...

But... (I join common obections):

* where are-they on my keyboard, how can I type them ?
(i can see french éèçà, but us-layout keyboard dont know them, imagine 
kanji or greek)

* how do I spell this cyrilic/kanji char ?

* when there are very similar chars, how can I distinguish them?
(without dealing with same representation chars having different unicode 
names)

* is "amédé" variable and "amede" the same ?

* its an anti-KISS rule.

* not only I write code, I read it too, and having such variation 
possibility in names make code really more unreadable.
(unless I learn other scripting representation - maybe not a bad thing 
itself, but its not the objective here).

* I've read "Restricting the language to ASCII-only identifiers does
not enforce comments and documentation to be English, or the identifiers
actually to be English words, so an additional policy is necessary,
anyway."
But even with comments in german or spanish or japanese, I can guess to 
identify what a (well written) code is doing with its data. It would be 
very difficult with unicode spanning identifiers.

==> I wouldn't use them.

So, keep ascii only.
Basic ascii is the lower common denominator known and available 
everywhere, its known by all developers who can identify these chars 
correctly (maybe 1 vs I or O vs 0 can get into problems with uncorrect 
fonts).

Maybe, make default file-encoding to utf8 and strings to be unicode 
strings by default (with a s"" for basic strings by example), but this 
is another problem.

L.Pointal.