[XML-SIG] DOM API

Paul Prescod paul@prescod.net
Sat, 24 Apr 1999 12:52:54 -0500


Greg Stein wrote:
> 
> XML 1.0 *defines* PIs. That is very different.

Okay, so you agree that PIs are part of XML document instance data. Let me
ask you this, do you think that Gadfly should dump the parts of the SQL
spec that Aaron doesn't like?

> Per my other email (treatise? :-), I think that I've discovered we are
> operating within two classes of applications:
> 
> * data-oriented use of XML
> * layout-oriented use of XML

This is a false dichotomy. Many of my customers are data-oriented people
who want to style their data. For instance I was at a stastical company
last week.

I gave you four specifications that used PIs: XML, xml-stylesheet, DCD and
DDML. Only one of those four has anything to do with stylesheets or
formatting. The other three are as applicable to data as to traditional
documents.

> For the former, I have not seen a case where a PI is necessary. For the
> latter: yes, you need a PI for stylesheets. Too bad... you get to use
> the DOM :-)

So to keep PIs out we should split the interface and (further) confuse new
Python programmers?

> Instead, the
> client has to reach into the internals of the DOM to set (and get!) the
> namespace info. Bleck!

Well, I've decided to put namespace info into minDOM even though it made
it significantly less "lightweight." 

> I maintain that the stylesheets are not applicable to certain classes of
> XML processing. So yes, they get punted too.

If there is a class of processing that does not use a feature then the
feature should be removed? Goodbye namespaces. Goodbye sub-elements.

> A simple API of elements and text is more than suitable.

Not data access APIs. XML's semantics are partially defined in the XML
specification itself and will be fully specified in an upcoming
specification called the "XML Information Set."

http://www.w3.org/TR/NOTE-xml-infoset-req

"The XML Information Set will describe these abstract XML objects and
their properties. It will provide a common reference set that other
specifications can use and extend to construct their underlying data
models, and will help to ensure interoperability among the various
XML-based specifications and among XML software tools in general."

Technical and intellectual interoperability is what I'm fighting for.

> Your spec didn't show it. Okay... so it has ChildNodes. How do you get
> the root element? Oops. You have to scan for the thing. Painful!

doc.childNodes
doc.documentElement

> It will *never* be more efficient. Accessing a Python attribute and
> doing a map-fetch will always be faster than a method call. Plain and
> simple.

This gets back to Mike's question: Are we creating a new library here or
defining a new *interface*? If we're defining a library then we know all
of the performance implications in advance.

Because if we are defining an interface then we need to consider
implementations that are implemented in ways that do not use Python hashes
underneath. Generating the hash or map-wrapper could be expensive.

> The origin of qp_xml was for efficiency first, simplicity second. I
> maintain that qp_xml provides both.

first_cdata, following_cdata, non-recursive text dumping? Doesn't seem
very simple to me. It is completely unlike any API I have ever seen, even
in strongly typed programming languages where it would seem more
appropriate.

> Again, back to this "dynamically typed language". That is your point of
> view, rather than a statement of fact. I won't attempt to characterize
> how you derived that point of view (from the DOM maybe?), but it is NOT
> the view that I hold.

The contents of an element are *by definition*, elements, characters and
processing instructions. You can't wish that fact away. That's a
heterogenous  list. 

WD-XML: "PIs are not part of the document's character data, but must be
passed through to the application."

> XML is a means of representing structured data. That structure takes the
> form of elements (with attributes) and contained text. I do not see how
> XML is a programming langauge, or that it is dynamically typed. It is
> simply a representation in my mind.

XML is not a programming language but it explicitly supports heterogenous
lists.

> And I'll ignore the quote which just seems to be silliness or
> flamebait...

My point: I don't think Python implementors should try to pretend that
Python does not (for example) support heterogenous lists and neither
should XML implementors.

> > > Moreover, it was also a total bitch to simply say "give me the child
> > > elements". Of course, that didn't work since the DOM insisted on returning
> > > a list of a mix of CDATA and elements.
> >
> > It told you what was in your document.
> 
> I also get that from qp_xml with a lot less hassle, so that says to me
> that the DOM is introducing needless complexity/hassle for the client.

It isn't needless complexity if you need the PIs. I could find an
application of XML that doesn't use attributes -- do we now define an API
that dumps those too?

> The only "structure" that I toss are PIs and comments. I do not view
> those as "structure". The contents (elements, attributes, text) are
> retained and can be reconstructed from the structure that qp_xml
> returns.

Fortunately it is not up to us to define XML. The XML specification says
that processors should pass them along to applications.

> Most of the DOM's interface is for *building* a DOM structure. It is
> conceivable that those APIs only exist as a way to response to parsing
> events, but I believe their existence is due to the fact that people
> want to build a DOM and then generate the resulting XML. 

In some cases they do. In other cases they read a DOM, make a small
modification and then write that. In still other cases, they make a DOM,
edit by hand in a graphical, DOM-based editor and then write that out. In
yet other cases, DOM modifications are performed in order to create a
graphical effect in a browser.

> Otherwise, we
> could have had two levels of the DOM interface: read-only (with private
> construction mechanisms), and read-write (as exemplified by the current
> DOM).

That's exactly what we have. Minidom is the read-only version with private
construction mechanisms and PyDOM/4DOM are read-write. 

> I could care less about compatibility. I'm trying to write an
> application here. 

If you could care less about compatibility, maybe you shouldn't be using
XML. XML is about compatibility.

> Geez... using your viewpoint: if I wanted
> compatibility, then maybe I should use Java or C since everybody else
> uses that.

Slavish adherence to conventions is not a good idea, but neither is
reinventing wheels. From my point of view that's exactly what qp_xml does.

> Goody for them. That doesn't help me write my application.

You have a library. It works for you. What's the problem? Now you want to
make it a standard API. That means that user interface concers become
important. Here are some important principles of interface design are:

 * reuse what people already know
 * do not unnecessarily multiply interfaces

People know and seem to like, the DOM. A subset can be made about as fast,
convenient and small as qp_xml.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

Company spokeswoman Lana Simon stressed that Interactive 
Yoda is not a Furby. Well, not exactly. 

"This is an interactive toy that utilizes Furby technology," 
Simon said. "It will react to its surroundings and will talk." 
  - http://www.wired.com/news/news/culture/story/19222.html