[I18n-sig] Re: Unicode debate

Just van Rossum just@letterror.com
Fri, 28 Apr 2000 18:38:14 +0100


At 2:09 PM +0200 28-04-2000, M.-A. Lemburg wrote:
>> 1, because 2 can lead to surprises when two strings containing binary goop
>> are added and only one was a literal in a source file with an explicit
>> encoding.
>
[...]
>I should have been more precise:
>
>2. provided both strings have encodings which can be converted
>   to Unicode, coerce them to Unicode and then apply the action;
>   otherwise proceed as in 1., i.e. the result has an undefined
>   encoding.
>
>If 2. does try to convert to Unicode, conversion errors should
>be raised (just like they are now for Unicode coercion errors).

But that doesn't solve the binary goop problem: two binary gooplets may
have different "encodings", which happen to be valid (ie. not raise an
exception). Conversion to unicode is no way what you want.

>Some more tricky business:
>
>How should str('bla', 'enc1') and str('bla', 'enc2') compare ?
>What about the hash values of the two ?

I proposed to *only* use the encoding attr when dealing with 8-bit
string/unicode string combo's. Just ignore it completely when there's no
unicode string in sight.

Just