[XML-SIG] DOM API

Tue, 20 Apr 1999 00:51:35 -0700 (PDT)

On Mon, 19 Apr 1999, Paul Prescod wrote:
> I'm going to propose instead a light-weight DOM subset. I would rather not
> require PyXML users to memorize two different APIs depending on whether
> they doing light-weight work or heavy-weight work. Apart from my decision
> to suggest a DOM subset, I have made my subset a little more functional in
> some places and a little less in others. My bias is to expose *more* of
> the underlying XML structure (processing instructions, attributes) and
> relegate handling for lang and namespace to the more complex APIs (or
> extensions to this API).

euh... I can definitely state that in the applications that I've been
working with, that PIs are bogus, but namespaces are absolutely required.
(that's how my code came to be!)

A general comment about your "subset" -- it is still heavyweight! Details
below...

> Parser.parse(input) (like qp_xml.parse but returns a document object)

How is a "document" different in your mind, than an element that happens
to be the root of a tree? I don't understand from your post. IMO, if you
wnat simple, then just give the user a tree... that's all the dumb XML is
anyhow.

> Node.ChildNodes (a sequence of nodes property)
> Node.NodeType (an integer a la DOM property)

NodeType is bogus. It should be absolutely obvious from the context what a
Node is. If you have so many objects in your system that you need NodeType
to distinguish them, then you are certainly not a light-weight solution.

> Document.DocumentElement (an element node property)

If Document has no other properties, then it is totally bogus. Just return
the root Element. Why the hell return an object with a single property
that refers to another object? Just return that object!

> Element.Attributes (a map of names to attribute objects property)
> Element.GetAttribute (returns an attribute's value)

If you want light-weight, then GetAttribute is bogus given that the same
concept is easily handled via the .Attributes value. Why introduce a
method to simply do Element.Attributes.get(foo) ??

> Element.TagName 
> Element.PreviousSibling 
> Element.NextSibing 

These Sibling things mean one of two things:

1) you have introduced loops in your data structure
2) you have introduced the requirement for the proxy crap that the current
DOM is dealing with (the Node vs _nodeData thing).

(1) is mildly unacceptable in a light-weight solution (you don't want
people to do a quick parse of data, and then require them to follow it up
with .close()). (2) throws the whole notion of "light" out the window. You
no longer have a simple, direct model of the parsed XML data.

> CharacterData.Data (a PyString property)

How do you get one of these objects? As soon as you say that an
Element.ChildNodes can return one of these, then you have complicated the
model. To keeps things simple, .ChildNodes should return objects of the
*same* type. Otherwise, all the clients are going to need to test the
contents. Clients will also have a hard time finding the right data.

Case in point: I wrote a first draft davlib.py against the DOM. Damn it
was a serious bitch to simply extract the CDATA contents of an element!
Moreover, it was also a total bitch to simply say "give me the child
elements". Of course, that didn't work since the DOM insisted on returning
a list of a mix of CDATA and elements.

The whole notion of mixing "node types" in a list is completely bogus if
you want direct simplicity in a model. It is one of my biggest problems
with the DOM thing. Some yahoos over in the XML DOM world want all this
nifty OO crap, yet they have built something that is hardly usable in a
practical application. Ergo, we have all kinds of filters and walking
solutions just to deal with mapping the complicated DOM structure into
something that is even marginally useful.

IMO, the XML DOM model is a neat theoretical expression of OO modelling of
an XML document. For all practical purposes, it is nearly useless. (again:
IMO) ... I mean hey: does anybody actually use the DOM to *generate* XML?
Screw that -- I use "print". I can't imagine generating XML using the DOM.
Complicated and processing intensive.

Sorry to go off here, but the DOM really bugs me. I think it is actually a
net-negative for the XML community to deal with the beast. I would love to
be educated on the positive benefits for expressing an XML document thru
the DOM model.

> Attribute.Name
> Attribute.Value

Use a mapping. Toss the intermediate object. If you just have name and
value, then you don't need separate objects. Present the attributes as a
mapping.

> ProcessingInstruction.Target (string property)
> ProcessingInstruction.Data (string property)

I have yet to see a specification related to XML that depends on PIs.
Until that happens, then I don't see how these are relevant.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/