Sax2 encoding

Juan M. Casillas juanm.casillas at eresmas.com
Fri Aug 30 06:20:43 EDT 2002


Hello folks!


I have an xml document that only begins with

<?xml version="1.0"?>
[...]


That is, without no info about the encoding. This document has special
characters encoded in ISO-8859-1 format (spanish characters just like
á, or ñ). When I try to parse the document with expat it works ok, but
I have to give it the default encoding:

import xml.parsers.expat
import sys

p = xml.parsers.expat.ParserCreate('ISO-8859-1')

[...]

f = open(sys.argv[1])
xmldocument = f.read()
f.close()

p.Parse(xmldocument)


But I need DOM ... and here comes my problem! when I create the 
DOM object and so on in the same way that the documentations says..

import sys
from xml.dom.ext.reader import Sax2

# create Reader object
reader = Sax2.Reader()

# parse the document
f = open(sys.argv[1])
doc = reader.fromStream(f)
f.close()

It bombs and gets me the following error:

Traceback (most recent call last):
  File "./parser2.py", line 11, in ?
    doc = reader.fromStream(f)
  File "/usr/local/python2/lib/python2.2/site-packages/_xmlplus/dom/ext/reader/Sax2.py", line 373, in fromStream
    self.parser.parse(s)
  File "/usr/local/python2/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/local/python2/lib/python2.2/site-packages/_xmlplus/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/usr/local/python2/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", line 211, in feed
    self._err_handler.fatalError(exc)
  File "/usr/local/python2/lib/python2.2/site-packages/_xmlplus/dom/ext/reader/Sax2.py", line 341, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: efe56.00.xml:6:43: not well-formed (invalid token)


and poking arround the file, I found a 'á' character at this position.
So my question is... how can I set the default encoding for the sax2
reader so the XML parser works for me ?

Thanks in advance,
Python Rocks!

Juan M. Casillas




More information about the Python-list mailing list