[Python-Dev] Re: [I18n-sig] Re: Unicode debate

Just van Rossum just@letterror.com
Tue, 2 May 2000 16:42:24 +0100


>[Just]
>> You're going to have a hard time explaining that "\377" != u"\377".
>
[GvR]
>I agree.  You are an example of how hard it is to explain: you still
>don't understand that for a person using CJK encodings this is in fact
>the truth.

That depends on the definition of truth: it you document that 8-bit strings
are Latin-1, the above is the truth. Conceptually classify all other 8-bit
encodings as binary goop makes the semantics chrystal clear.

>> Again, if you define that "all strings are unicode" and that 8-bit strings
>> contain Unicode characters up to 255, you're all set. Clear semantics, few
>> surprises, simple implementation, etc. etc.
>
>But not all 8-bit strings occurring in programs are Unicode.  Ask
>Moshe.

I know. They can be anything, even binary goop. But that's *only* an
artifact of the fact that 8-bit strings need to double as buffer objects.

Just