[Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints)

Just van Rossum just@letterror.com
Wed, 26 Apr 2000 13:04:08 +0100


Fredrik Lundh replied to himself in c.l.py:
>> as far as I can tell, it's supposed to be a feature.
>>
>> if you mix 8-bit strings with unicode strings, python 1.6a2
>> attempts to interpret the 8-bit string as an utf-8 encoded
>> unicode string.
>>
>> but yes, I also think it's a bug.  but this far, my attempts
>> to get someone else to fix it has failed.  might have to do
>> it myself... ;-)
>
>postscript: the powers-that-be has decided that this is not
>a bug.  if you thought that strings were just sequences of
>characters, just as in Perl and Tcl, you're in for one big
>surprise in Python 1.6...

I just read the last few posts of the powers-that-be-list on this subject
(Thanks to Christian for pointing out the archives in c.l.py ;-), and I
must say I completely agree with Fredrik. The current situation sucks. A
string should always be a sequence of characters. A utf-8-encoded 8-bit
string in Python is *not* a string, but a "ByteArray". An 8-bit string
should never be assumed to be utf-8 because of that distinction. (The
default encoding for the builtin unicode() function may be another story.)

Just