[XML-SIG] no 'writexml' when building a domTree from ext.Sax2

Andrew Clover and-xml at doxdesk.com
Wed Jul 28 07:20:51 CEST 2004


Alexandre Conrad <aconrad.tlv at magic.fr> wrote:

> Before, I used to generate XML files with doc.writexml(f) when doc was 
> created with 'doc = xml.dom.minidom.Document()'. But now, I have a dom 
> tree from the ext.Sax2.Reader() but I can't 'writexml'.

Yes. Trees build by xml.dom.ext.reader are from the PyXML-only 4DOM 
implementation, which is completely different code to the Python/PyXML 
minidom implementation.

There is no standard interface for serialising a document(*) so the 
implementations have different ways of doing it. With 4DOM, instead of 
writexml/toxml you get a separate serialiser object, eg:

   from xml.dom.ext.Printer import PrintVisitor
   PrintVisitor(sys.stdout, 'utf-8').visit(document)

* - well, other than the new DOM Level 3 LS standard, which neither
     minidom nor 4DOM yet support. (Insert customary pxdom plug here.)

> Also, I'm curious how I can tell Sax2.Reader() to ignore indentations 
> and newlines when reading from a pretty printed document.

XML normally says whitespace is significant so parsers should not 
general remove or mangle it.

The (optional) exception is 'element content whitespace', whitespace 
nodes that are inside elements whose content model (defined in the DTD, 
in a <!ELEMENT> declaration) says they contain only other elements, no text.

The Sax2 reader defaults to discarding element content whitespace 
(keepAllWs= 0), but the option doesn't actually work unless you tell it 
to use the DTD-validating parser:

   from xml.dom.ext.reader.Sax2 import Reader
   markup= '<!DOCTYPE x [<!ELEMENT x (x)*>]> <x>    <x/></x>'

   Reader().fromString(markup).documentElement.childNodes

     <NodeList [<Text Node '    '>, <Element Node 'x'>]>

   Reader(validate= 1).fromString(markup).documentElement.childNodes

     <NodeList [<Element Node 'x']>]>

If you're not using a DTD the extra whitespace nodes can't be avoided. 
(Other than with pxdom and the non-standard extension 
'pxdom-assume-element-content'.)

-- 
Andrew Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/


More information about the XML-SIG mailing list