minidom and unicode errors

Abhimanyu Seth abhimanyu.seth at gmail.com
Tue Mar 7 00:33:34 EST 2006


Hi all,

I'm trying to parse and modify an XML document using xml.dom.minidom module
and Python 2.4.2

>> from xml.dom import minidom
>> dom = minidom.parse ("c:/test.txt")

If the xml file contains a non-ascii character, then i get a parse error.
I have the following line in my xml file:
<target>Exception beim Löschen des Audit-Moduls aufgetreten. Exception Stack
lautet: %1.</target>
ExpatError: not well-formed (invalid token): line 8, column 27

If I remove the ö character, then it works fine. I'm guessing this has to do
with the default encoding which is ascii. I guess i can change the encoding
by modifying a file on my machine that the interpretter reads while loading,
but then how do I get my program to work on different machines?

Also, while writing such a special character to the file, I get an error.
>> document.writexml (file (myFile, "w"), encoding='utf-8')

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position
16: ordinal not in range(128)

Any help would be appreciated.

--
Regards,
Abhimanyu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20060307/b3ea8c9b/attachment.html>


More information about the Python-list mailing list