convert strings to utf-8

Niclas nike.is.nospam at home.se
Sun Feb 25 16:43:00 EST 2007


Thank you!

solved it with this:
  unicode( data.decode('latin_1') )
and when I write it to the file...
         f = codecs.open(path, encoding='utf-8', mode='w+')
         f.write(self.__rssDoc.toxml())

Diez B. Roggisch skrev:
> Niclas schrieb:
>> Hi
>>
>> I'm having trouble to work with the special charcters in swedish (Å Ä 
>> Ö å ä ö). The script is parsing and extracting information from a 
>> webpage. This works fine and I get all the data correctly. The 
>> information is then added to a rss file (using 
>> xml.dom.minidom.Document() to create the file), this is where it goes 
>> wrong. Letters like Å ä ö get messed up and the rss file does not 
>> validate. How can I convert the data to UTF-8 without loosing the 
>> special letters?
> 
> Show us code, and example text (albeit I know it is difficult to get 
> that right using news/mail)
> 
> The basic idea is this:
> 
> scrapped_byte_string = scrap_the_website()
> 
> output = scrappend_byte_string.decode('website-encoding').encode('utf-8')
> 
> 
> 
> Diez



More information about the Python-list mailing list