Unicode question

Gerhard Häring gh at ghaering.de
Thu Jul 17 20:07:13 EDT 2003


Thomas Heller wrote:
> Gerhard Häring <gh at ghaering.de> writes:
> 
> 
>> >>> u"äöü"
>>u'\x84\x94\x81'
>>
>>(Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")
>>
>>Why does this work?
>>
>>Does Python guess which encoding I mean? I thought Python should
>>refuse to guess :-)
> 
> 
> I stumbled over this yesterday, and it seems it is (at least) partially
> answered by PEP 263:
> 
>     In Python 2.1, Unicode literals can only be written using the
>     Latin-1 based encoding "unicode-escape". This makes the programming
>     environment rather unfriendly to Python users who live and work in
>     non-Latin-1 locales such as many of the Asian countries. Programmers
>     can write their 8-bit strings using the favorite encoding, but are
>     bound to the "unicode-escape" encoding for Unicode literals.
> 
> I have the impression that this is undocumented on purpose, because you
> should not write unescaped non-ansi characters into the source file
> (with 'unknown' encoding).

I agree that using latin1 as default is bad. If there's an encoding 
cookie in the 2.3+ source file then this encoding could be used.

I stumbled on this when giving another Python user on this list a 
pointer to the relevant section in the Python tutorial 
(http://www.python.org/doc/current/tut/node5.html#SECTION005130000000000000000) 
where Guido uses u"äöü" in an example.

As this is BAD the tutorial should probably be changed. I'll file a bug 
report.

-- Gerhard





More information about the Python-list mailing list