Comment on PEP 263 - Defining Python Source Code Encodings

Sat May 11 09:07:52 EDT 2002

>>>>> "David" == David LeBlanc <whisper at oz.net> writes:

    >> "David LeBlanc" <whisper at oz.net> writes:
    >> 
    >> > Firstly it's NOT "invalid xml". It IS a well formed xml tag!
    >> 
    >> It is well-formed, but it is invalid - check the XML spec. It
    >> violates at least one validity constraint, namely that the root
    >> element must be declared.

    David> XML is used far more often in "well formed" contexts than
    David> in "valid" contexts. This is hair splitting.

But the whole file is not well-formed, either.  So how is your
hypothetical (we Emacs users at least have an existence proof)
XML-grokking programmer's editor going to find it?  It will have to be
as ad hoc as the Emacs convention.  It's not going to be something
that "falls out" naturally from the editor analyzing the file.  In
that context, I think Martin's argument that the Emacs-style cookie is
easier to type is pretty strong.

BTW, how many of the editors you suggest actually can do anything with
the coding cookie?  Only Emacs and VIM that I know of can actually
switch encodings.  The rest of the (admittedly, all Unix) editors
simply operate in the platform environment, and depend on the console
or windowing system to deal with coding for them.

[OT
    David> Yes emacs runs on many platforms. On most of them, poorly
    David> and with a reduced feature set (this is certainly true of
    David> my several experiments with using emacs and xemacs on
    David> Windows).

When did you last try XEmacs?  Although I detest Windows, I do try
XEmacs for Windows occasionally, and I find the claims of the Windows
maintainers that Windows functionality now exceeds Unix functionality
plausible, although I disagree<wink>.]

    >> Also, the PEP accommodates notepad.exe, by recognizing UTF-8
    >> signatures.

Which, sigh, actually violates the Unicode standard.  (The standard
requires that for UTF-8 and UTF-{16,32}{LE,BE} a leading ZERO-WIDTH
NO-BREAK SPACE be considered exactly that, and it may not be
filtered.)

    David> Actually, for something as important as encoding, I think
    David> using "smart comments" for a feature that is known to the
    David> compiler is a mistake

_Actually_, it should not be known to the translator.  That the
interpreter knows anything about coding at all is a backward
compatibility kludge.

In fact, it is not really possible to do coding detection as part of
the parsing of the file.  Note that the XML spec itself requires that
implementations detect the BOM, and interpret it as a UTF-16
signature.  Only then can the implementation properly lex, detect, and
validate the (required) encoding declaration.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
 My nostalgia for Icon makes me forget about any of the bad things.  I don't
have much nostalgia for Perl, so its faults I remember.  Scott Gilbert c.l.py