[XML-SIG] SAX encoding and special characters

Thomas thomasj at eworld.hu
Sat Apr 17 05:15:22 EDT 2004


Hello,

I'm playing with SAX with Python-2.3.3. My goal is to parse XML files
(I don't want to generate them).
My XML file starts with:
<?xml version="1.0" encoding="iso-8859-2" ?>
I would like to get the encoding before parsing (I would like to use
it in ContentHandler class). Is there a way to get encoding from the
XML file with SAX? I tryed to open the file with InputStream and ask
with getEncoding() but it returned None all the time.
Is the encoding given in the XML file used by SAX?

My second problem/question is about special characters in XML.
Sometimes I have spec. chars (with char code 0-31) in XML and the
parser ends with:
Traceback (most recent call last):
  File "./testXML.py", line 175, in ?
    saxparser.parse(sys.argv[1])
  File "/usr/local/lib/python2.3/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/local/lib/python2.3/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/usr/local/lib/python2.3/xml/sax/expatreader.py", line 211, in feed
    self._err_handler.fatalError(exc)
  File "/usr/local/lib/python2.3/xml/sax/handler.py", line 38, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: spec_char.xml:68271:61: not well-formed (invalid token)

I would like to just ignore/drop out the problematic char. How can I
do that? I thought about putting an ErrorHandler but I think it can
only catch that situation but cannot process further the problematic
field.

I googled some hours on the net but didn't find any solution.

I would be happy to get some ideas.

Thanks in advance,
        Thomas

I have:
oh = optionsHandler()
saxparser = make_parser()
saxparser.setContentHandler(oh)
saxparser.parse(sys.argv[1])
optionsHandler works fine.




More information about the XML-SIG mailing list