Multibyte Character Surport for Python

John Roth johnroth at ameritech.net
Fri May 10 19:14:23 EDT 2002


"Martin v. Loewis" <martin at v.loewis.de> wrote in message
news:m3wuucbjlb.fsf at mira.informatik.hu-berlin.de...
> Erno Kuusela <erno-news at erno.iki.fi> writes:
>
> > | You mean, non-english-speaking people are prevented from using
FORTRAN
> > | and C? Can you name someone specifically? I don't know any such
person.
> >
> > i don't know such people either. but since many people only know
> > languages that aren't written in ascii, it seems fairly probable
that
> > they exist.
>
> I really question this claim. Most people that develop software (or
> would be interested in doing so) will learn the latin alphabet at
> school - even if they don't learn to speak English well.

The trouble is that while almost all of the languages used in the
Americas, Australia and Western Europe are based on
the Latin alphabet, that isn't true in the rest of the world, and
even then, it gets uncomfortable if your particular language's
diacritical marks aren't supported. You can't do really good,
descriptive names.

And good, descriptive names are one of the bedrocks of
good software.

I'd very much prefer that this issue get faced head on and
solved cleanly, although I doubt that it will be solved before
Python 3.0.

The way I'd suggest it is quite simple:

1. In Python 3.0, the input character set is unicode - either UTF-16 or
UTF-8
(I'm not prepared to make a solid arguement one way or the
other at this time.)

2. All identifiers MUST be expressed in the character set of
a single language (treating the various latin derived languages
as one for simplicity.) That doesn't mean that only one language
can be used for a module, only that a particular identifer must make
lexical sense in a specific language.

3. There must be a complete set of syntax words in each
supported language. That is, words such as 'and', 'or', 'if', 'else'
All such syntax words in a particular module must come from the
same language.

4. All syntax words are preceeded by a special character, which
is not presented to the viewer by Python 3.0 aware tools. Instead,
the special character is used to pick them out and highlight them.
The reason for this is that the vocabulary of syntax words can then
be expanded without impacting existing programs - they are
effectively from a different name space.


>
> Regards,
> Martin





More information about the Python-list mailing list