Multibyte Character Surport for Python

Tue May 14 10:15:44 EDT 2002

jacob at boris.cd.chalmers.se.cd.chalmers.se (Jacob Hallen) writes:

> I am Swedish and English is not my first language.
> 
> My view is that Python source code should be UTF-8, so that you can represent
> multilingual strings in a readable way. However, I still think that
> identifiers should be limited to ASCII. 
> 
> Just like music score is the common language for written music, English
> based programming languages have become the common base for programming.
> Just like you have to learn how to read music score (unless you have a
> perfect memory for tunes) to perform other peoples music, you need to learn
> basic English in order to make your programs readable by others and be
> able to read other peoples code.

I think this analogy limps a bit. People all over the world *do* perform other
peoples' music without having to learned (western) musical scores. Indeed, it
would be a major cultural catastrophe, were it otherwise. Also, the amount of
effort that is required for someone with a sufficiently different language
background to learn English well enough to come up with good interface names
and documentation is in no way comparable to the amount of effort involved in
learning to read musical scores. And to say that those who haven't enough
English (or need to name things which there are no English words) should then
at least stick to the 26 alphabetic characters is like telling English users
of FOO to to use Chinese or at least transliterate the roman alphabet into
Chinese, because Chinese is *the* language to do FOO in.

People might be willing to do this if FOO is a really big thing in their lives
but not if they just think FOO sounds like an interesting thing and they want
to learn about it, or FOO might help them with some other problems.

Would you tell an American kid interested in learning FOO to go and learn
Chinese first? Even if FOO had nothing to do with China and Chinese culture as
such?

> 
> I understand the attraction of using your native language for identifiers
> and comments, but it is really the dark side of the source.

Yes, there *are* big advantages associated with sticking with English for all
code, but you have to acknowledge that these advantages already come
comparatively free to you (and me, and Alex Martelli), since we happen to have
mastered English to a considerable extent (and it wasn't all that difficult,
given that Swedish, Italian, German and English are not all that
different). To many people outside Europe this doesn't apply, not even in a
rich country with an excellent education system such as Japan (one of the
reasons, I suspect why sun was clever enough to indulge the Japanese a bit
with Japanese language docs and the like). If python strives to bring
programming to the masses, and those masses are not all situated in Europe,
America (and a few select ex-colonies etc.)  then unicode strings might not be
enough, so I think it's necessary to think about which audience one wants to
accommodate and at whose expense.

alex

P.S.: I will also admit a slight fancy for greek identifiers in math code :)