writing Unicode objects to XML

Alessio Pace puccio_13 at yahoo.it
Mon May 5 06:09:50 EDT 2003


<posted & mailed>

Alex Martelli wrote:

> <posted & mailed>
> 
> Alessio Pace wrote:
> 
>> the first step of reading from XML (encoded in UTF-8) has been
>> accomplished through xml.minidom.
>> I get Unicode strings, and that's all right. But If I want to make Python
>> modify that source xml file, how should I do? I mean, those Unicode
>> objects
>> of kind u'n\xe8'  I need that they are stored exactly as before I read
>> them, so in the UTF-8 would be just n&#xe8;  which is this final step to
>> do this? I am getting crazy with all this XML and Unicode.. :-(
> 
> Here's a sample use:
> 
>>>> s
> '<?xml version="1.0" encoding="utf-8"?>\n<foo>n\xc3\xa8</foo>'
>>>> x=xml.dom.minidom.parseString(s)
>>>> x.toxml(encoding='iso-8859-1')
> '<?xml version="1.0" encoding="iso-8859-1"?>\n<foo>n\xe8</foo>'
>>>> x.toxml(encoding='utf-8')
> '<?xml version="1.0" encoding="utf-8"?>\n<foo>n\xc3\xa8</foo>'
>>>>
> 
> Of course, you would most often DO something to x between the
> time you parse it in and the time you write it back out, but in
> any case the 'encoding' is the key -- both in the xml declaration
> AND as a keyword parameter to the toxml method.
> 
> 
> Alex

Maybe I am missing something, because I tried but in the resulting new XML
file I dont' see what I expect.. Starting again, I have an XML file
declared encoded in UTF-8 (anyway, is it the default if I don't specify
anything?) and which contains character references such as 
&#xe8; and some others in the Text nodes. I parse it with
xml.dom.minidom.parse(pathToFile) and get a reference to a DOM tree, let's
call this variable 'xmldoc'. Now, let's say I want to store again this DOM
tree (because my application will have to modify some parameters in it). I
thought I had to do just:
f = codecs.open('file.xml', 'w', 'utf8')
f.write(xmldoc.toxml(encoding='utf-8') )
f.close()
But the result is not the original xml....
My sys.defaultencoding  is iso-8859-1, specified in the sitecustomize.py
script in python site-packages directory.
Thank you in advance.

-- 
bye
Alessio Pace




More information about the Python-list mailing list