Unicode question

Thomas Heller theller at python.net
Thu Jul 17 12:47:02 EDT 2003


Gerhard Häring <gh at ghaering.de> writes:

>  >>> u"äöü"
> u'\x84\x94\x81'
>
> (Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")
>
> Why does this work?
>
> Does Python guess which encoding I mean? I thought Python should
> refuse to guess :-)

I stumbled over this yesterday, and it seems it is (at least) partially
answered by PEP 263:

    In Python 2.1, Unicode literals can only be written using the
    Latin-1 based encoding "unicode-escape". This makes the programming
    environment rather unfriendly to Python users who live and work in
    non-Latin-1 locales such as many of the Asian countries. Programmers
    can write their 8-bit strings using the favorite encoding, but are
    bound to the "unicode-escape" encoding for Unicode literals.

I have the impression that this is undocumented on purpose, because you
should not write unescaped non-ansi characters into the source file
(with 'unknown' encoding).

Thomas




More information about the Python-list mailing list