Unicode/ascii encoding nightmare

Georg Brandl g.brandl-nospam at gmx.net
Mon Nov 6 16:06:53 EST 2006


Thomas W wrote:
> I'm getting really annoyed with python in regards to
> unicode/ascii-encoding problems.
> 
> The string below is the encoding of the norwegian word "fødselsdag".
> 
>>>> s = 'f\xc3\x83\xc2\xb8dselsdag'

Which encoding is this?

> I stored the string as "fødselsdag" but somewhere in my code it got

You stored it where?

> translated into the mess above and I cannot get the original string
> back. It cannot be printed in the console or written a plain text-file.
> I've tried to convert it using
> 
>>>> s.encode('iso-8859-1')
> Traceback (most recent call last):
>   File "<interactive input>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1:
> ordinal not in range(128)

Note that "encode" on a string object is often an indication for an error.
The encoding direction (for "normal" encodings, not special things like
the "zlib" codec) is as follows:

encode: from Unicode
decode: to Unicode

(the encode method of strings first DEcodes the string with the default
encoding, which is normally ascii, then ENcodes it with the given encoding)

>>>> s.encode('utf-8')
> Traceback (most recent call last):
>   File "<interactive input>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1:
> ordinal not in range(128)
> 
> And nothing helps. I cannot remember hacing these problems in earlier
> versions of python and it's really annoying, even if it's my own fault
> somehow, handling of normal characters like this shouldn't cause this
> much hassle. Searching google for "codec can't decode byte" and
> UnicodeDecodeError etc. produces a bunch of hits so it's obvious I'm
> not alone.

Unicode causes many problems if not used properly. If you want to use Unicode
strings, use them everywhere in your Python application, decode input as early
as possible, and encode output only before writing it to a file or another
program.

Georg



More information about the Python-list mailing list