trouble with xml.sax and unknow entities

Antony Lesuisse al2000 at udev.org
Sun Apr 27 00:30:55 EDT 2003


I'm not on the list, please cc: me the answers.

I'm having trouble to parse the folowing xml with the default python xml.sax
api. I'm using python2.2 on debian unstable powerpc (python2.2-xmlbase).

'<?xml version="1.0"?><html><body>hello   </body></html>'

See the code at the end.

xml.sax._exceptions.SAXParseException: <unknown>:1:39: undefined entity

The parser halt on &nsbsp; because it doesn't know about this entity. The
problem is cannot find a way to tell him what this entity is.

(1)
Is there a way to have a callback the parser arrive on   ? None of the
folowing handler functions (resolveEntity,notationDecl,unparsedEntityDecl) are
called.

I thought resolveEntity had to be called in that situation but i probably
misunderstand the sax api.

(2)
Is there a way to register entities before the parsing begin ?
Something like:
    parser.registerEntity(' ','blahblah')

(3)
Or is there a way to register an external DTD where those entities can be
defined ?  Something like:
    parser.registerExternalDTD('xhtml.dtd')

Thank you for your help.

-----------------------------------------------------------
#!/usr/bin/python
import StringIO,sys,xml.sax,xml.sax.handler

class CHandler(xml.sax.handler.ContentHandler):
    def startElement(self, name, attrs):
        print name
    def characters(self, ch):
        print ch.encode('Latin-1')

class EResolver(xml.sax.handler.EntityResolver):
    def resolveEntity(self,publicId,systemId):
        print " resolveEntity  ",publicId,systemId
        sys.exit()
class DHandler(xml.sax.handler.DTDHandler):
    def notationDecl(name, publicId, systemId):
        print " notationDecl ",publicId,systemId
        sys.exit()
    def unparsedEntityDecl(name, publicId, systemId, ndata):
        print " unparsedEntityDecl ",publicId,systemId,ndata
        sys.exit()

xmlstr = '<?xml version="1.0"?><html><body>hello   </body></html>'
parser = xml.sax.make_parser()
parser.setContentHandler(CHandler())
parser.setEntityResolver(EResolver())
parser.setDTDHandler(DHandler())
parser.parse(StringIO.StringIO(xmlstr))


-- 
Antony Lesuisse
GPG EA2CCD66: 4B7F 6061 3DF5 F07A ACFF  F127 6487 54F7 EA2C CD66





More information about the Python-list mailing list