Multibyte Character Surport for Python
gbreed at cix.compulink.co.uk
gbreed at cix.compulink.co.uk
Tue May 14 09:31:12 EDT 2002
Kragen Sitaker wrote:
> I agree that programming language keywords should not be localized;
> the notations for iteration, conditionals, math, abstraction,
> application, and so forth, should not vary by language. It is
> perfectly acceptable for a person who does not speak English to learn
> "if", "for", "except", and so forth, in order to speak Python; the
> vocabulary is quite small. It is no different from American musicians
> having to learn "allegro", "D.C. al fine", and "tremolo" --- it simply
> doesn't add significantly to the difficulty of the notation.
I disagree. I wouldn't object if a language used "si" or "weil" instead
of "if". But I sure as heck wouldn't want to use a Chinese character. No
matter how good a programming language is, if it requires the use of
Chinese characters I'm not touching it. I wouldn't expect a monolingual
Chinese speaker to feel any better about Python. Remember the subject is
"multibyte character support" not "alternative European code page
support".
> But variable and function names belong to the programmer and the
> program's audience, not the notation, and should be written in the
> language that affords these people the most expressive power.
Yes, but you can write any language using the roman alphabet. If you can
learn to use that alphabet for the keywords, you can translate variable
names as well. It's only a matter of convenience, or for speakers of
European languages that use accented characters.
Is it such a big problem to lose the accents? You still have to deal with
a standard library built around English. And there are all kinds of
problems that arise when you use arbitrary character sets. Like (hoping
these come out right) à and á can look similar from a distance, as can
"Latin Small Letter A With Macron". Would you feel confident
distinguishing ã and ä on a low resolution monitor? What happens if you
receive code that uses a character set you don't have a font for? If you
look through some Unicode tables you'll see characters that look
identical, in some cases are defined to be identical. Does the
interpreter have to keep a lookup table of equivalences? How does it know
what constitutes a "letter" in the first place?
I don't know if it's for English speakers to comment on, but I feel uneasy
about such a change. If the parser could recognise arbitrary characters,
the regular expressions knew what a letter was independent of locale and
Unicode strings could be reliably compared then at least the
implementation would be easy. But I can see people shooting themselves in
the foot as easily as they do with pointer arithmetic. Still, write a PEP
if you know exactly what you want. I could sleep much easier knowing such
a proposal had been definitively rejected.
Graham
More information about the Python-list
mailing list