xml.sax feature question

Martin v. Löwis martin at v.loewis.de
Sun Oct 26 04:56:50 EST 2003


christof hoeke <csad7 at yahoo.com> writes:

> the problem i have is that if the xmlfile has a doctype declaration
> the sax parser tries to load it and fails (IOError if course).
> partly because the path to the DTD is just a simple name in the same
> dir e.g. <!DOCTYPE contacts SYSTEM "contacts.dtd"> and i guess the
> parser does not use the path os.path.walk uses (can i somehow give the
> parser this information?). but it also could be a DTD which should be
> loaded over a network which is not available at the time.

In XML, the SYSTEM identifier is a URI reference; in your case, it is
a relative URL. An XML processor must interpret this relative to the
URL of the main document. If you have the main document on a local
disk, the relative URL will be intepreted relative to the file name.
So you should put the DTD along with the document (in the same
directory).

> i guess to simply set a feature of the sax parser to not try to load
> any external DTDs should work. question is which feature do i have to
> disable?
> 	p = xml.sax.make_parser()
>          p.setFeature('http://xml.org/sax/features/validation', False)
> 
> i thought turning off the validation would stop the parser to load
> external DTDs, but it still tries to load them.

This just turns of validation. The parser you are using is not
validating anyway, so this has no effect. The parser still loads the
DTD, in order to expand entity references it may encounter.

> any other suggestions?

You need to turn off resolution of general entities:

p.setFeature("http://xml.org/sax/features/external-general-entities",False)

Alternatively, you can install an entity handler which then uses a
different mechanism of resolving the DTD (and other external entities).

Regards,
Martin




More information about the Python-list mailing list