[Python-Dev] re: Unicode as argument for 8-bit strings

Bill Tutt billtut@microsoft.com
Fri, 7 Apr 2000 18:45:03 -0700


> There has been a bug report about the treatment of Unicode
> objects together with 8-bit format strings. The current
> implementation converts the Unicode object to UTF-8 and then
> inserts this value in place of the %s.... 
> 
> I'm inclined to change this to have '...%s...' % u'abc'
> return u'...abc...' since this is just another case of
> coercing data to the "bigger" type to avoid information loss.
> 
> Thoughts ?

Suddenly returning a Unicode string from an operation that was an 8-bit
string is likely to give some code exterme fits of despondency.

Converting to UTF-8 didn't give you any data loss, however it certainly
might be unexpected to now find UTF-8 characters in what the user originally
thought was 
a binary string containing whatever they had wanted it to contain.

Throwing an exception would at the very least force the user to make a
decision one way or the other about what they want to do with the data.
They might want to do a codepage translation, or something else. (aka Hey,
here's a bug I just found for you!)

In what other cases are you suddenly returning a Unicode string object from
which previouslly returned a string object?

Bill