xml.sax and reserved XML characters in the data stream
Eric I. Arnoth
earnoth at comcast.net
Sun Feb 2 18:27:43 EST 2003
I'm attempting to write an XML parser using xml.sax, but I've hit a problem
where the data in the XML document contains "<" and ">" characters. How
does one handle that? I've scoured the Python documentation and Googled
the hell out of it on the web & groups, but can find nothing.
Here's the error:
================================================================================
jefferson6:17pm[143]> ./my_xml_parser.py -f furball.short.xml
Traceback (most recent call last):
File "./my_xml_parser.py", line 410, in ?
my_xml_obj = myContentHandler(filename)
File "./my_xml_parser.py", line 117, in __init__
parser.parse(open(filename))
File "/usr/lib/python2.2/xml/sax/expatreader.py", line 90, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.2/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/usr/lib/python2.2/xml/sax/expatreader.py", line 148, in feed
self._err_handler.fatalError(exc)
File "/usr/lib/python2.2/xml/sax/handler.py", line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: furball.short.xml:127:89: not
well-formed (invalid token)
================================================================================
Here's the referenced chunk of the file:
================================================================================
jefferson6:18pm[144]> head -128 furball.short.xml | tail -1
<setting name="Misc information on News
server[entry]:From address :" value="email <listme at foobar.org>"/>
================================================================================
I have no choice but to deal with this, as the XML file is produced by a
third-party application and I have no control over the format or the
content. I had thought about pre-processing it before sending it through
xml.sax ContentHandler, but I am thinking that this is not a unique
problem. As such, I'm wondering if the xml.sax code has a way of handling
it (if so, I can't find it in the docs) or if there an established way of
pre-processing it (if so, can't find it on the web).
Any help would be greatly appreciated.
--
Eric I. Arnoth CISSP (http://www.isc2.org)
earnoth at comcast.net
http://mywebpages.comcast.net/earnoth
¤ø,¸¸,ø¤º°*°º¤ø,¸¸,ø¤ø,¸¸,ø¤º°*°º¤ø,¸¸,ø¤ø,¸¸,ø¤º°*°º¤ø,¸¸,ø¤ø,¸¸,ø¤º°*°º¤ø,¸¸,ø
More information about the Python-list
mailing list