sax EntityResolver problem (expat?)

Ralf Schmitt ralf at brainbot.com
Fri Jun 11 06:16:47 EDT 2004


chris <csad7 at yahoo.com> writes:

> hi,
> sax beginner question i must admit:
>
> i try to filter a simple XHTML document with a standard DTD
> declaration (<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
> Transitional//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">) in it.
> sax gives the following error
>
>  >>> xml.sax._exceptions.SAXParseException: <unknown>:53:8: undefined entity
>
> which is an   entity.
> so i thought i just implement the EntityResolver class and use a local
> copy of the DTD
>
> # ========================
> class XHTMLResolver(xml.sax.handler.EntityResolver, object):
>
>      def resolveEntity(self, publicId, systemId):
>          return 'http://localhost/xhtml1-transitional.dtd'
>
> reader = xml.sax.make_parser()
> reader.setEntityResolver(XHTMLResolver())
> # ========================
>
> problem is, it seems expat does not use this resolver as i get the
> same error again. i also tried the following, which is not supported
> anyhow:
>
> reader.setFeature('http://xml.org/sax/features/external-parameter-entities',
> True)
>  >>> xml.sax._exceptions.SAXNotSupportedException: expat does not read
> external parameter entities
>
> is the XHTMLResolver class not the way it should be? or do i have to
> set another feature/property?

That's the way it works for me. You can also just open() your dtd'
files and return an open file handle. Note that when using the above
dtd your resolveEntity will be called more than once with different id's.

--------------------------------
from xml.sax import saxutils, handler, make_parser, xmlreader
class Handler(handler.ContentHandler):
    def resolveEntity(self, publicid, systemid):
        print "RESOLVE:", publicid, systemid
        
        return open(systemid[systemid.rfind('/')+1:], "rb")
    def characters(self, s):
        print repr(s)
        
doc = r'''<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<HTML>
 ä
</HTML>
'''

h = Handler()
parser = make_parser()
parser.setContentHandler(h)
parser.setEntityResolver(h)

parser.feed(doc)
parser.close()
-------
Output:

RESOLVE: -//W3C//DTD XHTML 1.0 Transitional//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
RESOLVE: -//W3C//ENTITIES Latin 1 for XHTML//EN xhtml-lat1.ent
RESOLVE: -//W3C//ENTITIES Symbols for XHTML//EN xhtml-symbol.ent
RESOLVE: -//W3C//ENTITIES Special for XHTML//EN xhtml-special.ent
u'\n'
u'\xa0'
u'\xe4'
u'\n'

>
>
> ultimately i do not want to use the http://localhost copy but i would
> like to read the local file (just with open(...) or something) and go
> from there. is that possible? do i have to
>
>
> thanks a lot
> chris

-- 
brainbot technologies ag
boppstrasse 64 . 55118 mainz . germany
fon +49 6131 211639-1 . fax +49 6131 211639-2
http://brainbot.com/  mailto:ralf at brainbot.com



More information about the Python-list mailing list