Unicode/ascii encoding nightmare

Robert Kern robert.kern at gmail.com
Mon Nov 6 15:29:58 EST 2006


Thomas W wrote:
> I'm getting really annoyed with python in regards to
> unicode/ascii-encoding problems.
> 
> The string below is the encoding of the norwegian word "fødselsdag".
> 
>>>> s = 'f\xc3\x83\xc2\xb8dselsdag'
> 
> I stored the string as "fødselsdag" but somewhere in my code it got
> translated into the mess above and I cannot get the original string
> back. It cannot be printed in the console or written a plain text-file.
> I've tried to convert it using
> 
>>>> s.encode('iso-8859-1')
> Traceback (most recent call last):
>   File "<interactive input>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1:
> ordinal not in range(128)
> 
>>>> s.encode('utf-8')
> Traceback (most recent call last):
>   File "<interactive input>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1:
> ordinal not in range(128)
> 
> And nothing helps. I cannot remember hacing these problems in earlier
> versions of python and it's really annoying, even if it's my own fault
> somehow, handling of normal characters like this shouldn't cause this
> much hassle. Searching google for "codec can't decode byte" and
> UnicodeDecodeError etc. produces a bunch of hits so it's obvious I'm
> not alone.

You would want .decode() (which converts a byte string into a Unicode string),
not .encode() (which converts a Unicode string into a byte string). You get
UnicodeDecodeErrors even though you are trying to .encode() because whenever
Python is expecting a Unicode string but gets a byte string, it tries to decode
the byte string as 7-bit ASCII. If that fails, then it raises a UnicodeDecodeError.

However, I don't know of an encoding that takes u"fødselsdag" to
'f\xc3\x83\xc2\xb8dselsdag'.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco




More information about the Python-list mailing list