[I18n-sig] Re: Unicode debate

Toby Dickenson tdickenson@geminidataloggers.com
Tue, 02 May 2000 14:46:44 +0100


On Tue, 02 May 2000 08:31:55 -0400, Guido van Rossum
<guido@python.org> wrote:

>>     No automatic conversions between 8-bit "strings" and Unicode =
strings.
>>=20
>> If you want to turn UTF-8 into a Unicode string, say so.
>> If you want to turn Latin-1 into a Unicode string, say so.
>> If you want to turn ISO-2022-JP into a Unicode string, say so.
>> Adding a Unicode string and an 8-bit "string" gives an exception.
>
>I'd accept this, with one change: mixing Unicode and 8-bit strings is
>okay when the 8-bit strings contain only ASCII (byte values 0 through
>127).  That does the right thing when the program is combining
>ASCII data (e.g. literals or data files) with Unicode and warns you
>when you are using characters for which the encoding matters.  I
>believe that this is important because much existing code dealing with
>strings can in fact deal with Unicode just fine under these
>assumptions.  (E.g. I needed only 4 changes to htmllib/sgmllib to make
>it deal with Unicode strings -- those changes were all getattr() and
>setattr() calls.)
>
>When *comparing* 8-bit and Unicode strings, the presence of non-ASCII
>bytes in either should make the comparison fail; when ordering is
>important, we can make an arbitrary choice e.g. "\377" < u"\200".

I assume 'fail' means 'non-equal', rather than 'raises an exception'?


Toby Dickenson
tdickenson@geminidataloggers.com