[I18n-sig] Format strings

"Martin v. Löwis" martin at v.loewis.de
Mon Nov 28 20:46:15 CET 2005


Josef Spillner wrote:
> El Viernes, 25. Noviembre 2005 23:16, escribió:
> 
>>It is correct either way. A byte string is a byte string is a byte
>>string is a  string of bytes is not a Unicode string.
> 
> 
> That was the second part of my question. If a programmer writes down a string, 
> and the source file encoding is declared to be utf-8, why then is the string 
> still not encoded in utf-8 by default?

But it is encoded in utf-8! Why do you say it isn't? "be encoded in 
UTF-8" is different from "be a Unicode string". Unicode strings are
a separate data type (different from byte strings). "UTF-8" is a
*byte* encoding, so an UTF-8 string is *not* a character string,
but a byte string.

> Why all the hassle of using u"..." instead of making it the default?
> There is a lot of python source code I maintain, and it would simplify coding 
> a lot if this could be made the default.

There is an undocumented -U option which makes all string literals
Unicode strings. Please try this out - you will likely find that
your application breaks.

Regards,
Martin


More information about the I18n-sig mailing list