XML file parsing with SAX

Willem Ligtenberg WLigtenberg at gmail.com
Sat Apr 23 16:20:17 EDT 2005


I didn't make the XML file. And I don't like messing with other peoples
data. So I just want my SAX parser to ignore it. I can't help if other
people make it hard for me to read their xml file...

On Sat, 23 Apr 2005 13:48:49 -0600, Uche Ogbuji wrote:

> On Sat, 2005-04-23 at 15:20 +0200, Willem Ligtenberg wrote:
>> I decided to use SAX to parse my xml file.
>> But the parser crashes on:
>>   File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError
>>     raise exception
>> xml.sax._exceptions.SAXParseException: NCBI_Entrezgene.dtd:8:0: error in processing external entity reference
>> 
>> This is caused by:
>> <!DOCTYPE Entrezgene-Set PUBLIC "-//NCBI//NCBI Entrezgene/EN"
>> "NCBI_Entrezgene.dtd">
>> 
>> If I remove it, it parses normally.
>> I've created my parser like this:
>> import sys
>> from xml.sax import make_parser
>> from handler import EntrezGeneHandler
>> 
>> fopen = open("mouse2.xml", "r")
>> ch = EntrezGeneHandler()
>> saxparser = make_parser()
>> saxparser.setContentHandler(ch)
>> saxparser.parse(fopen)
>> 
>> And the handler is:
>> from xml.sax import ContentHandler
>> 
>> class EntrezGeneHandler(ContentHandler):
>> 	"""
>> 	A handler to deal with EntrezGene in XML
>> 	"""
>> 	
>> 	def startElement(self, name, attrs):
>> 		print "Start element:", name
>> 
>> So it doesn't do much yet. And still it crashes...
>> How can I tell the parser not to look at the DOCTYPE declaration.
>> On a website:
>> http://www.devarticles.com/c/a/XML/Parsing-XML-with-SAX-and-Python/1/
>> it states that the SAX parsers are not validating, so this error shouldn't
>> even occur?
> 
> Just because it's not validating doesn't mean that the parser won't try
> to read the external entity.
> 
> Maybe you're looking for 
> 
> """
> feature_external_ges
>         Value: "http://xml.org/sax/features/external-general-entities" 
>         true: Include all external general (text) entities. 
>         false: Do not include external general entities. 
>         access: (parsing) read-only; (not parsing) read/write
> """
> 
> Quote from:
> 
> http://docs.python.org/lib/module-xml.sax.handler.html
> 
> But you're on pretty shaky ground in any XML 1.x toolkit using a bogus
> DTDecl in this way.  Why go through the hassle?  Why not use a catalog,
> or remove the DTDecl?




More information about the Python-list mailing list