PEP 3131: Supporting Non-ASCII Identifiers

Stefan Behnel stefan.behnel-n05pAM at web.de
Tue May 15 10:39:56 EDT 2007


Paul Boddie wrote:
> what I'd like to see, for a change, is some kind
> of analysis of the prior art in connection with this matter. Java has
> had extensive UTF-8 support all over the place for ages, but either no-
> one here has any direct experience with the consequences of this
> support, or they are more interested in arguing about it as if it were
> a hypothetical situation when it is, in fact, a real-life situation
> that can presumably be observed and measured.

It's difficult to extract this analysis from Java. Most people I know from the
Java world do not use this feature as it is error prone. Java does not have
support for *explicit* source encodings, i.e. the local environment settings
win. This is bound to fail e.g. on a latin-1 system where I would like to work
with UTF-8 files (which tend to work better on the Unix build server, etc.)

In the Python world, these problems are solved now and will disappear when
UTF-8 becomes the default encoding (note that this does not inverse the
problem as people using non-utf8 encodings will then just set the respective
encoding tag in their files). So there is not much Python can learn from Java
here except for what it already does better.

I am actually working on a couple of Java projects that use German
identifiers, transliterated to prevent the encoding problems inherent to Java.
The transliteration makes things harder to read than necessary - and this is
only German-vs-English, i.e. simple things like 'ae' instead of 'ä' and 'ss'
instead of 'ß'. But sometimes things become hard to read that way or look like
different words. And it leads to all sorts of weirdly mixed names as sometimes
it is easier to write the similar looking (although maybe not completely
synonymous) English word instead of the transliterated German one.

So, yes, in a way, the code quality in these projects suffers from developers
not being able to freely write Unicode identifiers.

Stefan



More information about the Python-list mailing list