error when parsing xml

Diez B. Roggisch deets at nospam.web.de
Mon Sep 5 09:03:35 EDT 2005


Odd-R. wrote:
> This is retrieved through a webservice and stored in a variable test
> 
> <?xml version='1.0' encoding='utf-8'?>
> <!-- DTD for xmltest-->
> <!DOCTYPE testtest [ <!ELEMENT testtest ( test*)>
> <!ELEMENT test (#PCDATA)>]>
> <testtest><test>æøå</test></testtest>
> 
> printing this out yields no problems, so the trouble seems to be when executing
> the following:
> 
> doc = minidom.parseString(test)

You need to do

doc = minidom.parseString(test.encode("utf-8"))

The reason is simple: test is not a string, but a unicode object. 
XML-Parsers work with strings - thus passing a unicode object to them 
will convert it - with the default encoding, which is ascii. BTW, I used 
  encode("utf-8") because the header of your documnet says so. If it 
were latin1, you'd need that. There is plenty of unicode-related 
material out there - use google to search this NG or the web.

Diez



More information about the Python-list mailing list