problem with unicode

John Machin sjmachin at lexicon.net
Fri Apr 25 08:46:09 EDT 2008


On Apr 25, 10:01 pm, Bjoern Schliessmann <usenet-
mail-0306.20.chr0n... at spamgourmet.com> wrote:
> andreas.prof... at googlemail.com wrote:
> > # media is a binary string (mysql escaped zipped file)
>
> >>>> print media
> > x???[? ...
> > (works)
>
> Which encoding, perhaps UTF-8 or ISO8859-1?
>
> >>>> print unicode(media)
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0x9c in
> > position 1: ordinal not in range(128)
> > (ok i guess print assumes you want to print to ascii)
>
> Not at all -- unicode tries to decode the byte string you gave it,
> but doesn't know which encoding to use, so it falls back to ASCII.
>
> You should decode all "incoming" byte strings to unicode objects
> using the right encoding -- here I tried yours with UTF-8. This
> works best using string's method "decode" which returns a unicode
> object.
>
> >>> media="x???[?"
> >>> print repr(media.decode("utf-8"))
>
> u'x\u30ef\u30e6\u30ed[\u30e8'
>

But that_unicode_string.encode("utf-8") produces
'x\xe3\x83\xaf\xe3\x83\xa6\xe3\x83\xad[\xe3\x83\xa8'
which does not contain the complained-about byte 0x9c in position 1
(or any other position) -- how can that be?








More information about the Python-list mailing list