[Python-Dev] PEP 263 considered faulty (for some Japanese)

M.-A. Lemburg mal@lemburg.com
Tue, 12 Mar 2002 10:02:52 +0100


SUZUKI Hisao wrote:
> 
>    I am a Japanese fan/developer/user of Python for years.  I
> have recently read the PEP 263 --- Defining Python Source Code
> Encodings.  I have been discussing about it on the Japanese
> mailing list of Python last week, and I and others found a
> severe fault in it.
>    I have also read the Parade of the PEPs and know that it is
> very close to being checked in, so I am writing this message to
> you in English in a hurry.  The PEP 263, as is, will damage the
> usability of Python in Japan.

I certainly hope not since the PEP was specifically invented
to address those parts of the world which do not use ASCII or
Latin-1 as common encoding.

Reading your comments, though, I believe that the PEP actually
does help in your case too:

All you have to do is be explicit in the coding header of
a source file rather than no using such a header at all.

So in the end, you have to change one line per Python source
script, telling the interpreter what encoding the file
uses and your done.

Even though this requires a bit of work, in the end, I believe
that it is a net win, since you no longer have to maintain
magic data about the file via some other means.
 
>    The PEP says, "Just as in coercion of strings to Unicode,
> Python will default to the interpreter's default encoding (which
> is ASCII in standard Python installations) as standard encoding
> if no other encoding hints are given."  This will let many
> English people free from writing the magic comment to their
> scripts explicitly.  However, many Japanese set the default
> encoding other than ASCII (we use multi-byte encodings for daily
> use, not as luxury), and some Japanese set it, say, "utf-16".

This only applies if the interpreter does not find a
coding header. 

Strange enough, I changed the above lines
in the PEP to meet the demands of a Japanese Python user,
who uses two Japanese encodings on two different platforms:
They have the problem that they use CVS for the code and
thus can only have one coding header. One solution was to
not use the encoding header and set the default encoding
depending on the platform they run the code on. Another
solution involved a magic codec which determines its
encoding on a per-platform basis -- luckily the Python
codec registry is easily extendable so this doesn't pose
much of a problem.

BTW, using UTF-16 as default is a particularly bad choice...
you might as well stick to all Unicode then since Python
uses UCS-2 as internal storage format on narrow builds.

I hope this clarifies your concerns.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/