[Doc-SIG] DOM as output, not as internal

Tony J Ibbs (Tibs) tony@lsl.co.uk
Fri, 15 Jun 2001 10:43:53 +0100


I said:
> 9. DOM - my experience is that trying to use a DOM tree to
> do the work in leads to madness (I'll elucidate if anyone
> cares, but basically it's clearly too constraining).

David Goodger said:
> Please do elucidate. I haven't gotten to the point of
> actually *using* the DOM yet, so I'm open to alternatives.
> Using DOM's not written in stone, but the DOM software
> does exist and seems to fit the bill.

I said:
> However, I still reckon that producing a DOM
> tree as final *output* is very useful.

David said:
> PEP 258 specifies DOM for the output of the docstring parser
> and input of the formatter. Does that (the former) qualify
> as "final output"?

Yes, that's exactly what I meant (I've not been concerning myself with
the formatter particularly).


Elucidating, I was thus thinking about the datastructures that one
manipulates as one builds the document representation.

Since the DOM is not designed for subclassing, if one wants *convenient*
behaviours for one's classes, one has to either stick fingers into (for
instance) minidom.py and *assume* that that is what will be used, or
else reimplement one's one DOMlet.

The former was the approach I took originally, and it was a pain because
I had all this infrastructure that didn't *quite* match how I wanted to
think about the document structure *whilst I was building it*, but the
killer is that there is no guarantee that the minidom implementation
won't change over Python releases (as indeed it has).

The latter seems to be the approach that Zope has taken, but it's a pain
for other reasons - who wants to maintain their own DOM when (a) other
people are already doing it, and (b) there are more than one of them? (I
think it is important that although we provide the ability to *produce*
a DOM tree, we not mandate *whose* implementation we use)

So what I did was to produce my own classes, so that I process the
docstring and produce a datastructure representing it, and have an
appropriate method to produce a DOM from that. This means that the
person wanting to exploit the information from the docstring need only
understand how DOM works (and, of course, what schema we are using!),
and not my particular set of classes. Given the schema (DTD if you
will), different implementations can be used to *produce* the DOM tree,
which is exactly what we want.

If the DOM producing method takes an existing DOM node as an argument,
then we can even say we don't care *which* DOM implementation is being
used (because one can get at the document from any node, and thus the
factory methods) - assuming that the API for all Python DOM
implementations is sufficiently similar (I *assume* that minidom and
4DOM, for instance, might be very similar for the sorts of operations we
want to do - this may be naive, of course).

In summary, mandating that the DOM be used for the intermediary between
parser and formatter frees us up to choose different parsers and
formatters at will, but I personally don't think that trying to use DOM
inside the formatter gives useful advantages (and it is fairly trivial
to *construct* a DOM from any sane internal datastructure!).

Does that make sense?

Tibs

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"Bounce with the bunny. Strut with the duck.
 Spin with the chickens now - CLUCK CLUCK CLUCK!"
BARNYARD DANCE! by Sandra Boynton
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)