doc.toxml() gives ASCII encoding error

Jim Hefferon jhefferon at smcvt.edu
Wed Feb 18 17:28:15 EST 2004


Hello,

I'm having trouble with .xml files that have non-ascii characters. 
Here is a small example.

................................
#!/usr/bin/python2.2
import sys, os, os.path, re
import xml.dom.minidom

doc=xml.dom.minidom.parse(sys.argv[1])
print doc.toxml()
...............................

On an .xml that contains only ascii characters, it works just fine. 
But in one of my documents is the string
       <name>Martin Schröder</name>
and running the above script on that file gives:
  Traceback (most recent call last):
    File "/home/web/catalogue_read.py", line 6, in ?
      print doc.toxml()
  UnicodeError: ASCII encoding error: ordinal not in range(128)

I had the idea that the parser reads the xml declaration in the .xml
file (it is UTF-8), encodes the text parts into whatever is the
internal representation for unicode, and then .toxml sends it back out
again as a python unicode string.  But I can't reconcile that idea
with this outcome.

I'm simply lost; can anyone tell me what (no doubt clueless) thing
that I am
doing wrong?  I'm running under Fedora, so I have python 2.2, if
that's any help.

Thanks,
Jim



More information about the Python-list mailing list