Unicode/ascii encoding nightmare
Robert Kern
robert.kern at gmail.com
Mon Nov 6 15:29:58 EST 2006
Thomas W wrote:
> I'm getting really annoyed with python in regards to
> unicode/ascii-encoding problems.
>
> The string below is the encoding of the norwegian word "fødselsdag".
>
>>>> s = 'f\xc3\x83\xc2\xb8dselsdag'
>
> I stored the string as "fødselsdag" but somewhere in my code it got
> translated into the mess above and I cannot get the original string
> back. It cannot be printed in the console or written a plain text-file.
> I've tried to convert it using
>
>>>> s.encode('iso-8859-1')
> Traceback (most recent call last):
> File "<interactive input>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1:
> ordinal not in range(128)
>
>>>> s.encode('utf-8')
> Traceback (most recent call last):
> File "<interactive input>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1:
> ordinal not in range(128)
>
> And nothing helps. I cannot remember hacing these problems in earlier
> versions of python and it's really annoying, even if it's my own fault
> somehow, handling of normal characters like this shouldn't cause this
> much hassle. Searching google for "codec can't decode byte" and
> UnicodeDecodeError etc. produces a bunch of hits so it's obvious I'm
> not alone.
You would want .decode() (which converts a byte string into a Unicode string),
not .encode() (which converts a Unicode string into a byte string). You get
UnicodeDecodeErrors even though you are trying to .encode() because whenever
Python is expecting a Unicode string but gets a byte string, it tries to decode
the byte string as 7-bit ASCII. If that fails, then it raises a UnicodeDecodeError.
However, I don't know of an encoding that takes u"fødselsdag" to
'f\xc3\x83\xc2\xb8dselsdag'.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Python-list
mailing list