[XML-SIG] DOM: Multiple proxy problem

Sean Mc Grath digitome@iol.ie
Wed, 07 Oct 1998 09:07:06 +0100


[Fred Drake]

>  Frankly, what I want is to be able to address the tree structure at
>the application level and update things I need to, but otherwise not
>affecting the rest of the document instance.  Given the general need
>to be able to handle well-formed XML in a non-destructive manner, I'd
>like to be able to write out the modified instance with the fewest
>possible changes; I'd really like "diff" output between the original
>and edited files to be minimal.

This is one of the great debate points that surrounds XML and surrounded
SGML for ten years before it. In an ideal world, us programmers like
to be able to do null transformations. In XML terms, we want to
be able to go XML(1) -> DOM -> XML(2) such that XML(1) and XML(2) are
"equivalent".

the difficulty is in finding agreement on what "equivalent" means.
In SGML, certain whitespace is always considered insignificant. So,
if it disappears in the null transformation round-trip, no information
is lost. So in SGML, if you know that white space was insignificant
per your DTD you could work with clean tree structures i.e.:-

This:
	<foo>
	Hello World
	</foo>

comes out like this in a generated tree:-

        /---\
        |foo|
        \---/
          |
    |-----------|
    |Hello World|
    |-----------|


In XML on the other hand, *all* whitespace is considered significant.
All whitespace must be passed to the application. So at the very
least you need this for null transformation purposes:-

               /---\
               |foo|
               \---/
                 |
         |---------------|
         |\nHello World\n|
         |---------------|

It gets a bit harier though:-(. To maintain compatability with SGML's
notion of insignificant white space, validating XML parsers split white
space into two types:- space known to be significant and space that
can be safely thrown away. The latter is white space that occurs in
content models that do not have the #PCDATA keyword.

In a nutshell, all white space needs to be passed by the parser
to the application - even if the application is DOM:-) Whether
or not that white-space is classified into two types depends on
the capabilities of the XML parser underfoot.

In XML, it is up to the application (say XBEL) to make a statement
about white space. "White space is always significant" or "white
space after start-tags and before end-tags is always insignificant".
That kind of thing. Basically, on an application by application
basis the notion of "equivalence" varies.

Note that I have not even mentioned the word "entities". Don't
get me started on entities....

To make a long story even longer, XBEL should make some statement
about the significance or otherwise of whitespace. There is no
formalism for doing so, no declarative syntax. Just a statement.


Sean Mc Grath

def Get_URI_Of_Superlative_Scripting_Language():
	return "http://www.python.org"