doc.toxml() gives ASCII encoding error
Jim Hefferon
jhefferon at smcvt.edu
Wed Feb 18 17:28:15 EST 2004
Hello,
I'm having trouble with .xml files that have non-ascii characters.
Here is a small example.
................................
#!/usr/bin/python2.2
import sys, os, os.path, re
import xml.dom.minidom
doc=xml.dom.minidom.parse(sys.argv[1])
print doc.toxml()
...............................
On an .xml that contains only ascii characters, it works just fine.
But in one of my documents is the string
<name>Martin Schröder</name>
and running the above script on that file gives:
Traceback (most recent call last):
File "/home/web/catalogue_read.py", line 6, in ?
print doc.toxml()
UnicodeError: ASCII encoding error: ordinal not in range(128)
I had the idea that the parser reads the xml declaration in the .xml
file (it is UTF-8), encodes the text parts into whatever is the
internal representation for unicode, and then .toxml sends it back out
again as a python unicode string. But I can't reconcile that idea
with this outcome.
I'm simply lost; can anyone tell me what (no doubt clueless) thing
that I am
doing wrong? I'm running under Fedora, so I have python 2.2, if
that's any help.
Thanks,
Jim
More information about the Python-list
mailing list