[XML-SIG] no 'writexml' when building a domTree from ext.Sax2
Andrew Clover
and-xml at doxdesk.com
Wed Jul 28 07:20:51 CEST 2004
Alexandre Conrad <aconrad.tlv at magic.fr> wrote:
> Before, I used to generate XML files with doc.writexml(f) when doc was
> created with 'doc = xml.dom.minidom.Document()'. But now, I have a dom
> tree from the ext.Sax2.Reader() but I can't 'writexml'.
Yes. Trees build by xml.dom.ext.reader are from the PyXML-only 4DOM
implementation, which is completely different code to the Python/PyXML
minidom implementation.
There is no standard interface for serialising a document(*) so the
implementations have different ways of doing it. With 4DOM, instead of
writexml/toxml you get a separate serialiser object, eg:
from xml.dom.ext.Printer import PrintVisitor
PrintVisitor(sys.stdout, 'utf-8').visit(document)
* - well, other than the new DOM Level 3 LS standard, which neither
minidom nor 4DOM yet support. (Insert customary pxdom plug here.)
> Also, I'm curious how I can tell Sax2.Reader() to ignore indentations
> and newlines when reading from a pretty printed document.
XML normally says whitespace is significant so parsers should not
general remove or mangle it.
The (optional) exception is 'element content whitespace', whitespace
nodes that are inside elements whose content model (defined in the DTD,
in a <!ELEMENT> declaration) says they contain only other elements, no text.
The Sax2 reader defaults to discarding element content whitespace
(keepAllWs= 0), but the option doesn't actually work unless you tell it
to use the DTD-validating parser:
from xml.dom.ext.reader.Sax2 import Reader
markup= '<!DOCTYPE x [<!ELEMENT x (x)*>]> <x> <x/></x>'
Reader().fromString(markup).documentElement.childNodes
<NodeList [<Text Node ' '>, <Element Node 'x'>]>
Reader(validate= 1).fromString(markup).documentElement.childNodes
<NodeList [<Element Node 'x']>]>
If you're not using a DTD the extra whitespace nodes can't be avoided.
(Other than with pxdom and the non-standard extension
'pxdom-assume-element-content'.)
--
Andrew Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/
More information about the XML-SIG
mailing list