[XML-SIG] DOM: Multiple proxy problem

Andrew M. Kuchling akuchlin@cnri.reston.va.us
Tue, 6 Oct 1998 13:54:14 -0400 (EDT)


Fred L. Drake writes:
>Andrew M. Kuchling writes:
> > tree seems to be correct, though it's leaving in Text nodes containing
> > ignorable whitespace; got to fix that...
>
>  What's wrong with this?  I should be able to get ignorable
>writespace nodes as well as regular whitespace nodes.  Is there at
>least an option to get the whole thing, if the parser provides the
>events?

	The DOM spec doesn't actually mention ignorable whitespace,
presumably because the spec is concerned with navigating over an
existing tree, not how that tree came into existence.  Ignorable
whitespace would then be a matter for sax_builder, or whatever you use
to construct a tree.  (Personally, I'd prefer to lose the whitespace,
and add pretty-printing to the .toxml() method.)

	I noticed this when parsing one of Jon Bosak's marked-up
Shakespeare plays with xmllib and converting it to a DOM tree.  The
document node's children were [<Element node "PLAY">, <Text node "
">].  Document nodes can't have Text nodes as children [reference: DOM
1.1.1], so this would be whitespace that *must* be ignored; once the
Document node does stricter checking on its children (and it will do
so, after tonight), attempting to add the Text node would raise a
HierarchyRequestException.

	Actually, this is a good side issue: what should the standard
interface for making a DOM tree using a given parser be?  You can use
SAX easily enough:

import sys
from xml.sax import saxlib, saxexts
from sax_builder import SaxBuilder
import writer
p = saxexts.make_parser()

dh = SaxBuilder()
p.setDocumentHandler(dh)
p.parse(sys.argv[1])

doc = dh.document
print doc.toxml()	

But SAX loses some information, such as comments, so it can't be a
general solution.  You'd want parser-to-DOM drivers for PyExpat,
xmllib, xmlproc, etc. that preserved as much of the input XML as
possible.  Does anyone want to suggest an interface for this?

An off-the-top-of-my-head strawman proposal might be:

from xml.dom.builder import make_dom
tree = make_dom(sysID or file object [, parser = 'specific parser to use'] )

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
Anyone who considers arithmetical methods of producing random digits is, of
course, in a state of sin.
    -- John Von Neumann