xml.sax and reserved XML characters in the data stream

Eric I. Arnoth earnoth at comcast.net
Sun Feb 2 18:27:43 EST 2003


I'm attempting to write an XML parser using xml.sax, but I've hit a problem 
where the data in the XML document contains "<" and ">" characters.  How 
does one handle that?  I've scoured the Python documentation and Googled 
the hell out of it on the web & groups, but can find nothing.

Here's the error:
================================================================================
jefferson6:17pm[143]> ./my_xml_parser.py -f furball.short.xml
Traceback (most recent call last):
  File "./my_xml_parser.py", line 410, in ?
    my_xml_obj = myContentHandler(filename)
  File "./my_xml_parser.py", line 117, in __init__
    parser.parse(open(filename))
  File "/usr/lib/python2.2/xml/sax/expatreader.py", line 90, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib/python2.2/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/usr/lib/python2.2/xml/sax/expatreader.py", line 148, in feed
    self._err_handler.fatalError(exc)
  File "/usr/lib/python2.2/xml/sax/handler.py", line 38, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: furball.short.xml:127:89: not 
well-formed (invalid token)
================================================================================

Here's the referenced chunk of the file:
================================================================================
jefferson6:18pm[144]> head -128 furball.short.xml | tail -1
                        <setting name="Misc information on News 
server[entry]:From address :" value="email <listme at foobar.org>"/>
================================================================================

I have no choice but to deal with this, as the XML file is produced by a 
third-party application and I have no control over the format or the 
content.  I had thought about pre-processing it before sending it through 
xml.sax ContentHandler, but I am thinking that this is not a unique 
problem.  As such, I'm wondering if the xml.sax code has a way of handling 
it (if so, I can't find it in the docs) or if there an established way of 
pre-processing it (if so, can't find it on the web).

Any help would be greatly appreciated.


-- 
Eric I. Arnoth    CISSP (http://www.isc2.org)        
earnoth at comcast.net                
http://mywebpages.comcast.net/earnoth
¤ø,¸¸,ø¤º°*°º¤ø,¸¸,ø¤ø,¸¸,ø¤º°*°º¤ø,¸¸,ø¤ø,¸¸,ø¤º°*°º¤ø,¸¸,ø¤ø,¸¸,ø¤º°*°º¤ø,¸¸,ø




More information about the Python-list mailing list