Newbie XML SAX Parsing: How do I ignore an invalid token?

scott at crybabymaternity.com scott at crybabymaternity.com
Fri Jan 5 16:50:18 EST 2007


I've got an XML feed from a vendor that is not well-formed, and having
them change it is not an option.  I'm trying to figure out how to
create an error-handler that will ignore the invalid token and continue
on.

The file is large, so I'd prefer not to put it all in memory or save it
off and strip out the bad characters before I parse it.

I've included one of the problematic characters in a small XML snippet
below.

I'm new to Python, and I don't know how to accomplish this. Any help is
greatly appreciated!

-----------------------------------------------------------------

Here is my code:

from xml.sax import make_parser
from xml.sax.handler import ContentHandler
import StringIO

class ErrorHandler:
    def __init__(self, parser):
        self.parser = parser
    def warning(self, msg):
        print '*** (ErrorHandler.warning) msg:', msg
    def error(self, msg):
        print '*** (ErrorHandler.error) msg:', msg
    def fatalError(self, msg):
        print msg

class ContentHandler(ContentHandler):
    def __init__ (self):
        pass
    def startElement(self, name, attrs):
        pass
    def characters (self, ch):
        pass
    def endElement(self, name):
        pass

xmlstr = """
<cities>
  <city>
    <name>Tampa</name>
    <description>A great city 
 and place to live</description>
  </city>
  <city>
    <name>Clearwater</name>
    <description>Beautiful beaches</description>
  </city>
</cities>
"""
parser = make_parser()
curHandler = ContentHandler()
errorHandler = ErrorHandler(parser)
parser.setContentHandler(curHandler)
parser.setErrorHandler(errorHandler)
parser.parse(StringIO.StringIO(xmlstr))




More information about the Python-list mailing list