[Python-3000] Unicode and OS strings

Stephen J. Turnbull stephen at xemacs.org
Tue Sep 18 06:08:29 CEST 2007


>>>>> "Marcin 'Qrczak' Kowalczyk" <qrczak at knm.org.pl> writes:
 >>  > Well, for any scheme which attempts to modify UTF-8 by accepting
 >>  > arbitrary byte strings is used, *something* must be interpreted
 >>  > differently than in real UTF-8.

 >> Wrong.  In my scheme everything ends up in the PUA, on which real
 >> UTF-8 imposes no interpretation by definition.

 > This is wrong: UTF-8 is specified for PUA. PUA is no special from the
 > point of view of UTF-8.

It is from the point of view of the Unicode standard, specifically v5.
Please see section 16.5, especially about the "corporate use subarea".
(No, I hadn't considered this stuff yet in my proposal, but it's not
hard to accomodate.)

 > UTF-8 is defined for all Unicode scalar values,

Sure, and what I propose is entirely compatible with the specification
of UTF-8 as a UTF, unlike what you propose.  Until you understand why
that's true, we're at an impasse.

 >> I haven't gone back to check yet, but it's possible that a "real UTF-8
 >> conforming process" is required to stop processing and issue an error
 >> or something like that in the cases we're trying to handle.

 > "C10. When a process interprets a code unit sequence which purports to
 > be in a Unicode character encoding form, it shall treat ill-formed code
 > unit sequences as an error condition and shall not interpret such
 > sequences as characters."

Yeah, that's the one.

While I'm uncomfortable advocating the position that my proposal is
entirely compatible with C10, it is true that it treats ill-formed
sequences as an error, and it is arguable that "mapping code units to
characters in private space" is not the same as "interpreting them as
characters".  For obvious reasons I'm uncomfortable with that, but I
actually don't consider this non-conformance a huge loss in the
context of this thread since both your proposal and James Knight's do
equally non-conformant things.



More information about the Python-3000 mailing list