[Python-3000] PEP 3131 accepted

Wed May 23 06:20:31 CEST 2007

On 5/22/07, Ka-Ping Yee <python at zesty.ca> wrote:
> Python fits your brain.  Let's keep it that way.

I'm sorry, Ping, but you sound just like I was feeling about the PEP
at the start (and many others were too). You missed a bunch of
enlightening posts from people with quite a different perspective.

In particular very helpful was a couple of reports from the Java
world, where Unicode letters in identifiers have been legal for a long
time now. (JavaScript also supports this BTW.) The Java world has not
fallen apart, but Java programmers in countries where English is not
spoken regularly between programmers (e.g. Japan) find it very helpful
to be able to communicate with each other through identifiers in their
own language. Remember the mantra that *human* readability of code is
important? Well, it helps if your code can use at least some the
language spoken by those humans.

Of course, even Japanese programmers must master *some* English -- the
standard library and the language keywords are still in English, and
they are okay with that. But the code they write for each other to
read will be more readable *to them* if they don't have to resort to
Latin transliterations of Japanese words. Because that's what they do
today. And they don't like it. There code is already unreadable for us
(for me, anyway :-) -- their comments are in Japanese (that's legal
today) and so are their output messages (that's also legal today).

My own personal example would be a program calculating Dutch income
tax -- I'd be crazy trying to translate the Dutch tax-technical terms
into English, and since the ideosyncracies of taxes are utterly
localized, there would be no use for my program in other countries.
Now Dutch can (for the most part, without much loss of readability) be
written in ASCII, but the same idea of course applies to any
application of local law, customs etc.

Of course, for the standard library, there's a strict style rule
requiring only ASCII in identifiers, and using English for names,
comments and messages. A similar style guide is likely to be adopted
by other global open source projects. But there are lots of regional
open source projects too, and they can standardize on a different
common language.

Will there be occasional pain when someone writes a useful hack using
their local language and finds they have to translate it to English in
order to open source it? Sure. But the pain already exists if they
chose to use their own language for comments, messages, or even
identifiers (transliterated to the Latin alphabet). I don't expect
there to be much additional pain.

> > > PEP 3131 will also cause problems for code review.  Because many
> > > characters have indistinguishable appearances, there will be no
> > > mapping between what you see when you look at code and what the code
> > > actually says.

I trust most programmers to *want* to write clear code, so they will
steer clear from such things. If someone wants to obfuscate their code
they already have plenty of opportunities (even in Python!). The
problem is no worse than the lack of difference between 1 and l in
some fonts, and between l and I in others (and there are even fonts
where o and 0 look the same).

> Assigning blame elsewhere will not make the problem go away.

You may be misunderstanding the enthusiasm of your respondent.

> We do
> not incorporate buggy libraries into the Python core and then absolve
> ourselves by pointing fingers at the library authors; we should not
> incorporate the complicated and unsolved problems of international
> character sets into the language syntax definition, thereby turning
> them from problems with Unicode to problems with Python.

Yes, Unicode has its problems (so does ASCII BTW). But they can be
solved (see: Java and JavaScript). The Unicode standard also has some
guidelines. Solutions are actively being discussed in this list. If
you have any experience with other languages or fonts, please help. We
should probably be conservative; I'm not too hopeful about support for
right-to-left alphabets for example. But we can do better than ASCII
(or Latin-1, which is much worse).

Cheers,

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)