convert strings to utf-8

"Martin v. Löwis" martin at v.loewis.de
Sun Feb 25 13:07:53 EST 2007


Niclas schrieb:
> I'm having trouble to work with the special charcters in swedish (Å Ä Ö
> å ä ö). The script is parsing and extracting information from a webpage.
> This works fine and I get all the data correctly. The information is
> then added to a rss file (using xml.dom.minidom.Document() to create the
> file), this is where it goes wrong. Letters like Å ä ö get messed up and
> the rss file does not validate. How can I convert the data to UTF-8
> without loosing the special letters?

You should convert the strings from the webpage to Unicode strings.
You can see that a string is unicode of

print isinstance(s,unicode)

prints True. Make sure *every* string you put into the Document
actually is a Unicode string. Then it will just work fine.

Regards,
Martin



More information about the Python-list mailing list