PEP 3131: Supporting Non-ASCII Identifiers

Fri May 18 05:06:17 EDT 2007

Long and interresting discussion with different point of view.

Personnaly, even if the PEP goes (and its accepted), I'll continue to use
identifiers as currently. But I understand those who wants to be able to
use chars in their own language.

* for people which are not expert developers (non-pros, or in learning
context), to be able to use names having meaning, and for pro developers
wanting to give a clear domain specific meaning - mainly for languages non
based on latin characters where the problem must be exacerbated.
They can already use unicode in strings (including documentation ones).

* for exchanging with other programing languages having such identifiers...
when they are really used (I include binding of table/column names in
relational dataabses).

* (not read, but I think present) this will allow developers to lock the
code so that it could not be easily taken/delocalized anywhere by anybody.

In the discussion I've seen that problem of mixing chars having different
unicode number but same representation (ex. omega) is resolved (use of an
unicode attribute linked to representation AFAIU).

I've seen (on fclp) post about speed, it should be verified, I'm not sure we
will loose speed with unicode identifiers.

On the unicode editing, we have in 2007 enough correct editors supporting
unicode (I configure my Windows/Linux editors to use utf-8 by default).

I join concern in possibility to read code from a project which may use such
identifiers (i dont read cyrillic, neither kanji or hindi) but, this will
just give freedom to users.

This can be a pain for me in some case, but is this a valuable argument so
to forbid this for other people which feel the need ?

IMHO what we should have if the PEP goes on:

* reworking on traceback to have a general option (like -T) to ensure
tracebacks prints only pure ascii, to avoid encoding problem when
displaying errors on terminals.

* a possibility to specify for modules that they must *define* only
ascii-based names, like a from __futur__ import asciionly. To be able to
enforce this policy in projects which request it.

* and, as many wrote, enforce that standard Python libraries use only ascii
identifiers.