PEP 3131: Supporting Non-ASCII Identifiers

Tue May 15 12:48:44 EDT 2007

On 15 May, 17:41, Stefan Behnel <stefan.behnel-n05... at web.de> wrote:
>

[javac -encoding Latin-1 Hallo.java]

> From a Python perspective, I would rather call this behaviour broken. Do I
> really have to pass the encoding as a command line option to the compiler?

They presumably weighed up the alternatives and decided that the most
convenient approach (albeit for the developers of Java) was to provide
such a compiler option. Meanwhile, developers get to write their
identifiers in the magic platform encoding, which isn't generally a
great idea but probably works well enough for some people - their
editor lets them write their programs in some writing system and the
Java compiler happens to choose the same writing system when reading
the file - although I wouldn't want to rely on such things myself.
Alternatively, they can do what Python programmers do now and specify
the encoding, albeit on the command line.

However, what I want to see is how people deal with such issues when
sharing their code: what are their experiences and what measures do
they mandate to make it all work properly? You can see some
discussions about various IDEs mandating UTF-8 as the default
encoding, along with UTF-8 being the required encoding for various
kinds of special Java configuration files. Is this because
heterogeneous technical environments even within the same cultural
environment cause too many problems?

> I find Python's source encoding much cleaner here, and even more so when the
> default encoding becomes UTF-8.

Yes, it should reduce confusion at a technical level. But what about
the tools, the editors, and so on? If every computing environment had
decent UTF-8 support, wouldn't it be easier to say that everything has
to be in UTF-8? Perhaps the developers of Java decided that the rules
should be deliberately vague to accommodate people who don't want to
think about encodings but still want to be able to use Windows Notepad
(or whatever) to write software in their own writing system.

And then, what about patterns of collaboration between groups who have
been able to exchange software with "localised" identifiers for a
number of years? Does it really happen, or do IBM's engineers in China
or India (for example) have to write everything strictly in ASCII? Do
people struggle with characters they don't understand or does copy/
paste work well enough when dealing with such code?

Paul