convert strings to utf-8

Diez B. Roggisch deets at nospam.web.de
Sun Feb 25 10:34:53 EST 2007


Niclas schrieb:
> Hi
> 
> I'm having trouble to work with the special charcters in swedish (Å Ä Ö 
> å ä ö). The script is parsing and extracting information from a webpage. 
> This works fine and I get all the data correctly. The information is 
> then added to a rss file (using xml.dom.minidom.Document() to create the 
> file), this is where it goes wrong. Letters like Å ä ö get messed up and 
> the rss file does not validate. How can I convert the data to UTF-8 
> without loosing the special letters?

Show us code, and example text (albeit I know it is difficult to get 
that right using news/mail)

The basic idea is this:

scrapped_byte_string = scrap_the_website()

output = scrappend_byte_string.decode('website-encoding').encode('utf-8')



Diez



More information about the Python-list mailing list