Unicode/ascii encoding nightmare

Thomas W thomas.weholt at gmail.com
Mon Nov 6 14:50:50 EST 2006


I'm getting really annoyed with python in regards to
unicode/ascii-encoding problems.

The string below is the encoding of the norwegian word "fødselsdag".

>>> s = 'f\xc3\x83\xc2\xb8dselsdag'

I stored the string as "fødselsdag" but somewhere in my code it got
translated into the mess above and I cannot get the original string
back. It cannot be printed in the console or written a plain text-file.
I've tried to convert it using

>>> s.encode('iso-8859-1')
Traceback (most recent call last):
  File "<interactive input>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1:
ordinal not in range(128)

>>> s.encode('utf-8')
Traceback (most recent call last):
  File "<interactive input>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1:
ordinal not in range(128)

And nothing helps. I cannot remember hacing these problems in earlier
versions of python and it's really annoying, even if it's my own fault
somehow, handling of normal characters like this shouldn't cause this
much hassle. Searching google for "codec can't decode byte" and
UnicodeDecodeError etc. produces a bunch of hits so it's obvious I'm
not alone.

Any hints?




More information about the Python-list mailing list