Python and encodings drives me crazy

Diez B. Roggisch deets at web.de
Mon Jun 20 18:00:32 EDT 2005


Oliver Andrich wrote:
> Well, I narrowed my problem down to writing a macroman or cp850 file
> using the codecs module. The rest was basically a misunderstanding
> about codecs module and the wrong assumption, that my input data is
> iso-latin-1 encode. It is UTF-8 encoded. So, curently I am at the
> point where I have my data ready for writing....
> 
> Does the following code write headline and caption in MacRoman
> encoding to the disk? Or better that, is this the way to do it?
> headline and caption are both unicode strings.
> 
>     f = codecs.open(outfilename, "w", "macroman")
>     f.write(headline)
>     f.write("\n\n")
>     f.write(caption)
>     f.close()

looks ok - but you should use u"\n\n" in general - if that line for some 
reason chages to "öäü" (german umlauts), you'll get the error you 
already observed. But using u"äöü" the parser pukes at you when the 
specified coding of the file can't decode that bytes to the unicode object.

Most problems occdure when one confuses unicode objects with strings - 
this requires a coercion that will be done using the default encoding 
error you already observed.


Diez



More information about the Python-list mailing list