Double decoding of strings??

manuzhai at gmail.com manuzhai at gmail.com
Mon Dec 5 07:35:19 EST 2005


Hi all,

I have a bit of a problem. I'm trying to use Python to work with some
data which turns out to be garbage. Ultimately, I think the solution
will be to .decode('utf-8') a string twice, but Python doesn't like
doing this the second time. That could possibly be understandable, but
then why does the unicode object have a .decode() method at all?

I get 'WVL Algemeen Altru\xc3\x83\xc2\xafsme genormeerd Afbeelden' at
first.
I .decode('utf-8') this to u'WVL Algemeen Altru\xc3\xafsme genormeerd
Afbeelden'.
I then try to .decode('utf-8') this again, but that gives an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "C:\Program Files\Python\lib\encodings\utf_8.py", line 16, in
decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
18-19: ordinal not in range(128)

If I copy/paste 'WVL Algemeen Altru\xc3\xafsme genormeerd Afbeelden'
and try to .decode('utf-8') it, that works fine, and it gets me the
result I want, which is u'WVL Algemeen Altru\xefsme genormeerd
Afbeelden'.

Why does it work this way? How can I make it work?

Regards,

Manuzhai




More information about the Python-list mailing list