Sax2 encoding
Juan M. Casillas
juanm.casillas at eresmas.com
Fri Aug 30 06:20:43 EDT 2002
Hello folks!
I have an xml document that only begins with
<?xml version="1.0"?>
[...]
That is, without no info about the encoding. This document has special
characters encoded in ISO-8859-1 format (spanish characters just like
á, or ñ). When I try to parse the document with expat it works ok, but
I have to give it the default encoding:
import xml.parsers.expat
import sys
p = xml.parsers.expat.ParserCreate('ISO-8859-1')
[...]
f = open(sys.argv[1])
xmldocument = f.read()
f.close()
p.Parse(xmldocument)
But I need DOM ... and here comes my problem! when I create the
DOM object and so on in the same way that the documentations says..
import sys
from xml.dom.ext.reader import Sax2
# create Reader object
reader = Sax2.Reader()
# parse the document
f = open(sys.argv[1])
doc = reader.fromStream(f)
f.close()
It bombs and gets me the following error:
Traceback (most recent call last):
File "./parser2.py", line 11, in ?
doc = reader.fromStream(f)
File "/usr/local/python2/lib/python2.2/site-packages/_xmlplus/dom/ext/reader/Sax2.py", line 373, in fromStream
self.parser.parse(s)
File "/usr/local/python2/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/local/python2/lib/python2.2/site-packages/_xmlplus/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/usr/local/python2/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", line 211, in feed
self._err_handler.fatalError(exc)
File "/usr/local/python2/lib/python2.2/site-packages/_xmlplus/dom/ext/reader/Sax2.py", line 341, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: efe56.00.xml:6:43: not well-formed (invalid token)
and poking arround the file, I found a 'á' character at this position.
So my question is... how can I set the default encoding for the sax2
reader so the XML parser works for me ?
Thanks in advance,
Python Rocks!
Juan M. Casillas
More information about the Python-list
mailing list