Unicode perplex

Irmen de Jong irmen at -nospam-remove-this-xs4all.nl
Mon Jun 21 17:08:45 EDT 2004


John Roth wrote:

> Remember that the trick
> is that it's still going to have the *same* stream of
> bytes (at least if the Unicode string is implemented
> in UTF-8.) 

Which it isnt't.

AFAIK Python's storage format for Unicode strings is
some form of 2-byte representation, it certainly isn't
UTF-8.

So if you want to turn your string into a Python Unicode
object, you really have to push it trough the UTF-8 codec...

--Irmen



More information about the Python-list mailing list