Python Statements/Keyword Localization

Terry Reedy tjreedy at udel.edu
Wed Nov 25 16:26:00 EST 2009


Emanuele D'Arrigo wrote:
> Greetings everybody,
> 
> some time ago I saw a paper that used an XSL transformation sheet to
> transform (if I remember correctly) a Chinese xml file (inclusive of
> Chinese-script XML tags) into an XHTML file.
> 
> More recently you might have all heard how the ICANN has opened up the
> way for non-latin characters in domain names, so that we'll soon start
> seeing URLs using Russian, Asian and Arabic characters.
> 
> In this context I was wondering if there has ever been much thought
> about a mechanism to allow the localization not only of the strings
> handled by python but also of its built-in keywords, such as "if",
> "for", "while", "class" and so on.

There have been various debates and discussions on the topic. There has 
been slow movement away from ascii-only in user code. (But not in the 
stdlib, nor will there be there.)
1. Unicode data type.
2. Unicode allowed in comment and string literals.
This required input decoding and coding cookie. This lead, I believe 
somewhat accidentally, to
3. Extended ascii (high bit set, for other European chars in various 
encodings) for identifiers.
4 (In 3.0) unicode allowed for identifiers

  Here is a version of the anti-customized-keyword position. Python is 
designed to be read by people. Currently, any programmer in the world 
can potentially read any Python program. The developers, especially 
Guido, like this. Fixed keywords are not an undue burden because any 
educated person should learn to read Latin characters a-z,0-9. and 
Python has an intentionally  short list that the developers are loath to 
lengthen.

Change 4 above inhibits universal readability. But once 3 happened and 
str became unicode, in 3.0, it was hard to say no to this.

A 'pro' argument: Python was designed for learning and is good for that 
and *is* used in schools down to the elementary level. But kids cannot 
be expected to know foreign alphabets and words whill still learning 
their own.

 > For example, the following English-
> based piece of code:
> 
> class MyClass(object):
>     def myMethod(self, aVariable):
>          if aVariable == True:
>             print "It's True!"
>          else:
>             print "It's False!"
> 
> would become (in Italian):
> 
> classe LaMiaClasse(oggetto):
>     def ilMioMetodo(io, unaVariabile)
>          se unaVariabile == Vero:
>              stampa "E' Vero!"
>          altrimenti:
>              stampa "E' Falso!"
> 
> I can imagine how a translation script going through the source code
> could do a 1:1 keyword translation to English fairly quickly but this
> would mean that the runtime code still is in English and any error
> message would be in English.

This is currently seen as a reason to not have other keywords: it will 
do no good anyway. A Python programmer must know minimal English and the 
keywords are the least of the problem.

I can imagine that there could be a mechanism for extracting and 
replacing error messages with translations, like there is for Python 
code, but I do not know if it will even happen with haphazard volunteer 
work or will require grant sponsorship.

> I can also imagine that it should be
> possible to "simply" recompile python to use different keywords, but
> then all libraries using the English keywords would become
> incompatible, wouldn't they?
> 
> In this context it seems to be the case that the executable would have
> to be able to optionally accept -a list- of dictionaries to internally
> translate to English the keywords found in the input code and at most -
> one- dictionary to internally translate from English output messages
> such as a stack trace.
> 
> What do you guys think?

I would like anyone in the world to be able to use Python, and I would 
like Python programmers to potentially be able to potentially read any 
Python code and not have the community severely balkanized. To me, this 
would eventually mean both native keywords and tranliteration from other 
alphabets and scripts to latin chars. Not an easy project.

Terry Jan Reedy




More information about the Python-list mailing list