[I18n-sig] PEP 263 and Japanese native encodings

07 Mar 2002 20:42:26 +0100

Tamito KAJIYAMA <kajiyama@grad.sccs.chukyo-u.ac.jp> writes:

> You've described only the condition of a syntax error; backslash
> as a second byte causes run-time problems even when it is
> followed by some characters.  

I see. In phase 1 of the PEP, this problem will only occur for byte
strings. For Unicode literals, those problems will not happen: Python
will decode the string before escape characters are considered, so the
problem can won't occur in Unicode strings.

For byte strings, it won't bring any changes. Your best bet is to
declare them as raw. In Phase 2, the encoding will be applied to all
strings.

So people that want Japanese strings should use Unicode literals.

> | Or is that a problem that only exists on paper?
> 
> No.  Suppose that you could not put common English words like
> "table", "reserve", "ten" and "paste" in string literals; such
> a restriction would not be acceptable at all, right? :-)

If the restriction was that you cannot have such a word as the last
word of a string (but need some spacing character after it), I think
the restriction might be acceptable - although admittedly arbitrary.

Also, notice that the restriction is only for byte strings.

> I've thought that Marc-Andre's intent for ASCII compatibility
> (i.e., ASCII compatible encodings should be able to represent
> the first two lines of comments only by ASCII characters) is
> good enough.  It appears that his requirement has no problem
> with regard to the implementation stategy described in the PEP
> (revision 1.9) *and* Japanese encodings.  IMHO, the ASCII
> compatibility simply should not impose other requirements.

That sounds nice on paper (or rather, in your email message); it
simply does not work in practice. For it to work, the lexer needs to
operate on Unicode characters instead of bytes. Such a change is quite
complex, and cannot be carried out until phase 2 of the PEP.

Anybody interested is encouraged to discuss implementation strategies
on this list. I know that I probably can't find the time to implement
that part before Python 2.3. Also, I'd think that getting the Japanese
codecs and other CJK codecs into Python would be a prerequisite for
implementing phase 2.

Regards,
Martin