XML file parsing with SAX
Uche Ogbuji
uche.ogbuji at gmail.com
Sat Apr 23 15:48:49 EDT 2005
On Sat, 2005-04-23 at 15:20 +0200, Willem Ligtenberg wrote:
> I decided to use SAX to parse my xml file.
> But the parser crashes on:
> File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError
> raise exception
> xml.sax._exceptions.SAXParseException: NCBI_Entrezgene.dtd:8:0: error in processing external entity reference
>
> This is caused by:
> <!DOCTYPE Entrezgene-Set PUBLIC "-//NCBI//NCBI Entrezgene/EN"
> "NCBI_Entrezgene.dtd">
>
> If I remove it, it parses normally.
> I've created my parser like this:
> import sys
> from xml.sax import make_parser
> from handler import EntrezGeneHandler
>
> fopen = open("mouse2.xml", "r")
> ch = EntrezGeneHandler()
> saxparser = make_parser()
> saxparser.setContentHandler(ch)
> saxparser.parse(fopen)
>
> And the handler is:
> from xml.sax import ContentHandler
>
> class EntrezGeneHandler(ContentHandler):
> """
> A handler to deal with EntrezGene in XML
> """
>
> def startElement(self, name, attrs):
> print "Start element:", name
>
> So it doesn't do much yet. And still it crashes...
> How can I tell the parser not to look at the DOCTYPE declaration.
> On a website:
> http://www.devarticles.com/c/a/XML/Parsing-XML-with-SAX-and-Python/1/
> it states that the SAX parsers are not validating, so this error shouldn't
> even occur?
Just because it's not validating doesn't mean that the parser won't try
to read the external entity.
Maybe you're looking for
"""
feature_external_ges
Value: "http://xml.org/sax/features/external-general-entities"
true: Include all external general (text) entities.
false: Do not include external general entities.
access: (parsing) read-only; (not parsing) read/write
"""
Quote from:
http://docs.python.org/lib/module-xml.sax.handler.html
But you're on pretty shaky ground in any XML 1.x toolkit using a bogus
DTDecl in this way. Why go through the hassle? Why not use a catalog,
or remove the DTDecl?
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://fourthought.com
http://copia.ogbuji.net http://4Suite.org
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
XML Output with 4Suite & AMara - http://www.xml.com/pub/a/2005/04/20/py-xml.html
Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/
Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerworks/xml/library/x-think31.html
More information about the Python-list
mailing list