[I18n-sig] Format strings

M.-A. Lemburg mal at egenix.com
Mon Nov 28 12:55:40 CET 2005


Josef Spillner wrote:
> El Viernes, 25. Noviembre 2005 23:16, escribió:
> 
>>It is correct either way. A byte string is a byte string is a byte
>>string is a  string of bytes is not a Unicode string.
> 
> 
> That was the second part of my question. If a programmer writes down a string, 
> and the source file encoding is declared to be utf-8, why then is the string 
> still not encoded in utf-8 by default?

Because the source code encoding is only used to decode the Unicode
literals in the source code into Unicode objects.

Plain string literals do not have an encoding attached and
are regarded as plain byte code strings. As a result, they are
passed through the decoding mechanism by reencoding them after
first decding them to Unicode (using the source code encoding).

> Why all the hassle of using u"..." instead of making it the default?

This will happen in Python 3.0.

> There is a lot of python source code I maintain, and it would simplify coding 
> a lot if this could be made the default.

Indeed, but it potentially also breaks a lot of code since Python
and the many extensions for it are not yet fully Unicode compatible.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 28 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the I18n-sig mailing list