[Python-Dev] PEP 263 considered faulty (for some Japanese)

Stephen J. Turnbull stephen@xemacs.org
13 Mar 2002 14:27:51 +0900


>>>>> "SUZUKI" == SUZUKI Hisao <suzuki611@oki.com> writes:

    SUZUKI> I should have appended to that, "And English people will
    SUZUKI> distribute programs with no magic comments all over the
    SUZUKI> world.  Japanese users will use them."

But this "just works" as long as the default encoding is an ASCII
superset (or even JIS X 0201 (^^; as Japanese users are now all
equipped with YEN SIGN <-> REVERSE SOLIDUS codecs).

    SUZUKI> Certainly Japanese users are also free from putting
    SUZUKI> encoding declarations, but we do not expect such programs
    SUZUKI> to be usable in other countries than Japan, given the PEP
    SUZUKI> as is.

But this is also true for everyone else, except Americans.  All of the
common non-ASCII encodings are non-universal and therefore
non-portable, with the exception of UTF-8 and X Compound Text (and the
latter is a non-starter in program sources because of the 0x22
problem).

I myself objected to this PEP because I think it's far too easy for my
Croatian (Latin-2) friend working in Germany to paste a Latin-1 quote
into a Latin-2 file.  He'll do it anyway on occasion, but if we start
insisting _now_ that "Python programs are written in UTF-8", we'll
avoid a lot of mojibake.  12 years in Japan makes that seem an important
goal.<wink>  But such multiscript processing is surely a lot more
rare in any country but Japan.

    SUZUKI> BTW, when transmitting Python source code between Unix and
    SUZUKI> Windows, we do not necessarily convert encodings.

But this is bad practice.  You can do it if it works for you, but
Python should not avoid useful changes because people are treating
different encodings as the same!

    SUZUKI> Just one worry: [UTF-8 BOM] may be incompatible with
    SUZUKI> '#!/usr/bin/env' used in Unix.

It probably is, but it's out of Python's control: the editor will add
it.  And this can (and will) be handled by changing the shells.

    SUZUKI> I understand that making UTF-8 the standard encoding
    SUZUKI> immediately for all source files does not have
    SUZUKI> feasibility.  I'd think we have had two options:

    SUZUKI> 1. Wait until when the UTF-8 is popular, and then adopt [it].

    SUZUKI> 2. Make Python able to handle various [popular] encodings.

There's a third option:

    3.  Make UTF-8 the only encoding acceptable for "standard Python",
        and insert a hook for a codec to be automatically run on source
        text.  Standard Python would _never_ put anything on this hook,
        but an optional library would provide other codecs, including one
        to implement PEP 263.

Guido thought the idea has merit, as an implementation.  Therefore
UTF-8 would be encouraged by Python, but PEP 263 would give official
sanction to the -*- coding: xxx -*- cookie.  And this would give you a
lot of flexibility for experimentation (eg, with UTF-16 codecs, etc).


-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
              Don't ask how you can "do" free software business;
              ask what your business can "do for" free software.