XML file parsing with SAX
Willem Ligtenberg
WLigtenberg at gmail.com
Sat Apr 23 16:20:17 EDT 2005
I didn't make the XML file. And I don't like messing with other peoples
data. So I just want my SAX parser to ignore it. I can't help if other
people make it hard for me to read their xml file...
On Sat, 23 Apr 2005 13:48:49 -0600, Uche Ogbuji wrote:
> On Sat, 2005-04-23 at 15:20 +0200, Willem Ligtenberg wrote:
>> I decided to use SAX to parse my xml file.
>> But the parser crashes on:
>> File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError
>> raise exception
>> xml.sax._exceptions.SAXParseException: NCBI_Entrezgene.dtd:8:0: error in processing external entity reference
>>
>> This is caused by:
>> <!DOCTYPE Entrezgene-Set PUBLIC "-//NCBI//NCBI Entrezgene/EN"
>> "NCBI_Entrezgene.dtd">
>>
>> If I remove it, it parses normally.
>> I've created my parser like this:
>> import sys
>> from xml.sax import make_parser
>> from handler import EntrezGeneHandler
>>
>> fopen = open("mouse2.xml", "r")
>> ch = EntrezGeneHandler()
>> saxparser = make_parser()
>> saxparser.setContentHandler(ch)
>> saxparser.parse(fopen)
>>
>> And the handler is:
>> from xml.sax import ContentHandler
>>
>> class EntrezGeneHandler(ContentHandler):
>> """
>> A handler to deal with EntrezGene in XML
>> """
>>
>> def startElement(self, name, attrs):
>> print "Start element:", name
>>
>> So it doesn't do much yet. And still it crashes...
>> How can I tell the parser not to look at the DOCTYPE declaration.
>> On a website:
>> http://www.devarticles.com/c/a/XML/Parsing-XML-with-SAX-and-Python/1/
>> it states that the SAX parsers are not validating, so this error shouldn't
>> even occur?
>
> Just because it's not validating doesn't mean that the parser won't try
> to read the external entity.
>
> Maybe you're looking for
>
> """
> feature_external_ges
> Value: "http://xml.org/sax/features/external-general-entities"
> true: Include all external general (text) entities.
> false: Do not include external general entities.
> access: (parsing) read-only; (not parsing) read/write
> """
>
> Quote from:
>
> http://docs.python.org/lib/module-xml.sax.handler.html
>
> But you're on pretty shaky ground in any XML 1.x toolkit using a bogus
> DTDecl in this way. Why go through the hassle? Why not use a catalog,
> or remove the DTDecl?
More information about the Python-list
mailing list