Unicode question
Thomas Heller
theller at python.net
Thu Jul 17 12:47:02 EDT 2003
Gerhard Häring <gh at ghaering.de> writes:
> >>> u"äöü"
> u'\x84\x94\x81'
>
> (Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")
>
> Why does this work?
>
> Does Python guess which encoding I mean? I thought Python should
> refuse to guess :-)
I stumbled over this yesterday, and it seems it is (at least) partially
answered by PEP 263:
In Python 2.1, Unicode literals can only be written using the
Latin-1 based encoding "unicode-escape". This makes the programming
environment rather unfriendly to Python users who live and work in
non-Latin-1 locales such as many of the Asian countries. Programmers
can write their 8-bit strings using the favorite encoding, but are
bound to the "unicode-escape" encoding for Unicode literals.
I have the impression that this is undocumented on purpose, because you
should not write unescaped non-ansi characters into the source file
(with 'unknown' encoding).
Thomas
More information about the Python-list
mailing list