problem with unicode

Bjoern Schliessmann usenet-mail-0306.20.chr0n0ss at spamgourmet.com
Fri Apr 25 08:01:12 EDT 2008


andreas.profous at googlemail.com wrote:

> # media is a binary string (mysql escaped zipped file)
> 
>>>> print media
> x???[?
...
> (works)

Which encoding, perhaps UTF-8 or ISO8859-1?
 
>>>> print unicode(media)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0x9c in
> position 1: ordinal not in range(128)
> (ok i guess print assumes you want to print to ascii)

Not at all -- unicode tries to decode the byte string you gave it,
but doesn't know which encoding to use, so it falls back to ASCII.

You should decode all "incoming" byte strings to unicode objects
using the right encoding -- here I tried yours with UTF-8. This
works best using string's method "decode" which returns a unicode
object.

>>> media="x???[?"
>>> print repr(media.decode("utf-8"))
u'x\u30ef\u30e6\u30ed[\u30e8'

Regards,


Björn

-- 
BOFH excuse #379:

We've picked COBOL as the language of choice.




More information about the Python-list mailing list