PEP 3131: Supporting Non-ASCII Identifiers

Fri May 18 00:45:30 EDT 2007

> Possibly.  One Java program I remember had Japanese comments encoded
> in Shift-JIS.  Will Python be better here?  Will it support the source
> code encodings that programmers around the world expect?

It's not a question of "will it". It does today, starting from Python 2.3.

>> Another possible reason is that the programmers were unsure whether
>> non-ASCII identifiers are allowed.
> 
> If that's the case, I'm not sure how you can improve on that in Python.

It will change on its own over time. "Not allowed" could mean "not
permitted by policy". Indeed, the PEP explicitly mandates a policy
that bans non-ASCII characters from source (whether in identifiers or
comments) for Python itself, and encourages other projects to define
similar policies. What projects pick up such a policy, or pick a
different policy (e.g. all comments must be in Korean) remains to
be seen.

Then, programmers will not be sure whether the language and the tools
allow it. For Python, it will be supported from 3.0, so people will
be worried initially whether their code needs to run on older Python
versions. When Python 3.5 comes along, people hopefully have lost
interest in supporting 2.x, so they will start using 3.x features,
including this one.

Now, it may be tempting to say "ok, so lets wait until 3.5, if people
won't use it before anyway". That is trick logic: if we add it only
to 3.5, people won't be using it before 4.0. *Any* new feature
takes several years to get into wide acceptance, but years pass
surprisingly fast.

> There are lots of possible reasons why all these programmers around
> the world who want to use non-ASCII identifiers end-up not using them.
> One is simply that very people ever really want to do so.  However,
> if you're to assume that they do, then you should look the existing
> practice in other languages to find out what they did right and what
> they did wrong.  You don't have to speculate.

That's indeed how this PEP came about. There were early adapters, like
Java, then experience gained from it (resulting in PEP 263, implemented
in Python 2.3 on the Python side, and resulting in UAX#39 on the Unicode
consortium side), and that experience now flows into PEP 3131.

If you think I speculated in reasoning why people did not use the
feature in Java: sorry for expressing myself unclearly. I know for
a fact that the reasons I suggested were actual reasons given by
actual people. I'm just not sure whether this was an exhaustive
list (because I did not interview every programmer in the world),
and what statistical relevance each of these reasons had (because
I did not conduct a scientific research to gain statistically
relevant data on usage of non-ASCII identifiers in different
regions of the world).

Regards,
Martin