[XML-SIG] Replicating DTD information using XMLFilterBase and XMLGenerator

Stefan Behnel stefan_ml at behnel.de
Sun Jul 27 22:38:44 CEST 2008


Hi,

James Sulak wrote:
> I'm attempting to use xml.sax.utils.XMLFilterBase and XMLGenerator to
> take an input XML document, filter out certain elements, and output
> the result to a second XML file.  I have it mostly working, except
> that I lose the DTD declaration and anything (processing instructions
> or comments) before the root element.  I believe I'm supposed to be
> using a LexicalHandler to get the information from the DTD, but I have
> not been able to figure out how to do this, or how to integrate it
> with the rest of the code.
>
> I'm pretty new at using Python (and SAX, for that matter) to work with
> XML

Try lxml's iterparse() instead of SAX. It will build an in-memory tree
(including the DTD or its reference if you want, see the parser docs), but you
can remove the unwanted elements from the tree while it parses. It's still
pretty memory friendly and definitely a lot easier to work with than SAX.

http://codespeak.net/lxml/parsing.html#iterparse-and-iterwalk
http://codespeak.net/lxml/tutorial.html#parsing-from-strings-and-files

Stefan


More information about the XML-SIG mailing list