[Doc-SIG] Re: DPS and DOM trees

Tony J Ibbs (Tibs) tony@lsl.co.uk
Fri, 31 Aug 2001 10:22:49 +0100


David Goodger wrote:
> Actually, after trying to use the DOM (xml.dom.minidom) directly, I
> saw what you & Garth & Guido meant by its inconvenience and coded up
> the nodes.py class library instead. That let me make the classes
> convenient to use, with polymorphism and other OO goodies.

Well, it's inconvenient to *create*, but I have an odd affection for it
as a datastructure to have to hand (too much XML stuff going on in the
background here, I guess - our field's extensive use of XML schema is
bound to be corrupting me as well).

I do still agree that the approach of regarding the DOM tree as an
"output format" in this case is Very Sensible.

> I added DOM output (.asdom()) as an option, not because it was needed
> for anything, but (violating XP, I know) because it was easy and
> served as a proof of concept.

No, it *is* useful.

I was looking again at PyPaSax again last night [1]_ - if it didn't
depend on PyXML being installed, I'd be considering that for doing the
Python parsing job (as it is, I've started to adopt some inspiration
from their (partial) DTD - at some point I'll write a Proper schema for
the Python part, and, for instance, using a "py" namespace seems very
sensible to me).

Having the DOM tree as interface means that we could slot the DPS
functionality into PyPaSax and lo, presto, they'd have docstring
structure instead of docstring #text. Alternatively, if the DOM tree is
the interface to the formatter, they benefit from that work as well
[bias]_.

.. [bias] I admit to historical bias here - I already know how easy
   it is to output HTML from a DOM tree - indeed, that's how the
   embrionic ``docutils`` did it, so I can learn from my experience
   with that code. I'll probably go that route because it's easy
   for me - we can always back port to the "internal" classes later
   on (I don't see that as hard). Also, selfishly, practice at
   thinking in XML related datastructures is a Useful Thing for me
   to have (that's *undoubtedly* bad XP practice!).

> Nothing works off an actual DOM tree directly. The parser creates a
> nodes.py tree, and unless necessity arises I don't see why the rest of
> the DPS (output formatters included) shouldn't use the same as their
> interface data structure.

I agree, save for the bias_ admitted above (heh, neat, a use for
footnotes-as-references!).

> > I'm still at a *very* early stage in navigating around the source
> > code.
>
> Not helped much by the paucity of documentation; apologies.

I think we code in different styles (or, perhaps, you code generically
earlier than I would), which means that there is more of a "tangle" of
interdependencies to unentwine, where I would introduce that later on
(refactoring being fun). But thinking is good for one, and generally
leads to good results.

And I can already see a niche for myself later on in adding docstrings
(yes, that probably was volunteering).

> > Is the stuff in nodes.py the class structure for the Python tree
> > structure
>
> Yes.

OK. Eventually we need to have direct documentation in there on how it
all hangs together - the DTD is not enough (indeed, is it still meant to
be correct?). But there's still time.

> I have some updating to do on the PEPs. Will specify the nodes.py
> implementation.

Neat.

> > Somehow I can't help feeling that the infoset is a natural for my
> > thinking of this "document" we're working with...
>
> Which "infoset"?

Erm - the general, wave-hands-in-the-air thingy that XML is just a
serialised encoding of [2] (see, I have some of the terms internalised
already). One of my "aha" moments with XML was (whilst reading [3])
realising that the "representation talked about" (i.e., XML) is not the
stuff that matters, which is the underlying infoset (I should, of
course, have realised that earlier, but noone had *said* "conceptual
schema", so my reflexes hadn't been triggered).

My other one was coming across the XPath (I think, or was it XPointer)
stuff that lets you treat a document as a tree structure and a linear
structure *at the same time* - I *like* that thinking.

> If the doc object is the constructor, we can change the interface to
> .asdom() to accept such an object, or default to an
> xml.dom.minidom.Document. That doesn't help with tree fragments
> though.

I'm not worrying - last night it took me ten minutes (I made a mistake
first time) to do what I wanted with the existing code - as soon as I
realised that the tagName attribute was what drove the DOM element
output, and thus could reset that from (the default) "document" to what
I wanted ("py:docstring") for the docstring.

So the current http://www.tibsnjoan.co.uk/reST/pydps.tgz contains code
that outputs an XML representation of a document, *including* the
structure of the docstring, integrated neatly in.

> Anybody who doesn't want to use the DPS structure can use .asdom() to
> convert. If the need arises, we (they) can create a DOM->DPS tree
> converter too.

Ooh - pretty toy. But can I think of a *need* for one?

> > > It would be very convenient if both the DPS and
> > > PyChecker used a common module for module parsing;
> > > this module could be maintained independently.
> >
> > Hmm. It should be doable - its just there would be a lot more
> > information extracted than we need.
>
> It's easier to prune a tree than to grow one! But until an alternative
> exists, we must grow our own.

The structure I am working with/towards at the moment has two phases
(three if one includes DOM) - well, maybe four:

1. Produce the AST using ``compiler``
2. Traverse the AST and produce appropriate objects to
   represent the higher level concepts - i.e., Package
   (still to come), Module, Class, Function, Method and
   (I realised late last night) probably Name.
3. For *our* purposes, produce a DPS tree from that
   - this is when we can ignore data we're not interested
   in.
4. If wished, produce a DOM tree from *that*.

Stage 2 is implemented by ``pydps.visit``, Stage 3 by ``pydps.nodes``,
and Stage 4 is a function within ``pydps.nodes``. At the moment, Stage 2
is not as general as it might be - whether it gets expanded by me or the
PyChecker people, later on, I'm not sure.

Also, Stages 1 and 2 together should work as a Useful Example for the
use of the ``compiler`` tool.

Hmm - if we're aiming at Python 2.3 for adoption of DPS/reST, we should
probably also be campaigning for ``compiler`` being moved from Tools to
Lib.

> > I may change the name of "dom.py" - indeed, I may change all sorts
> > of things!
>
> Change is to be expected and embraced.

And it's now (perhaps confusingly) called ``nodes.py``.

> > "Bounce with the bunny. Strut with the duck.
> > Spin with the chickens now - CLUCK CLUCK CLUCK!"
> > BARNYARD DANCE! by Sandra Boynton
>
> We obviously have a parent of (recent, if not current) preschoolers.

One current, one recent.

> I prefer, "But not the armadillo!" That's *funny*! (First time ;-)

Our favourite is (well, you guessed) Barnyard Dance. It survives being
read/chanted an infinite number of times rather well. Indeed, we
recently bought a "keeping" copy to supplement the one that Thomas has
chewed.

But I've been a fan of Sandra Boynton for a Long Time - it used to be
possible to buy birthday/christmas/etc. cards by her, which were
seriously neat (they appear and disappear from the shops over the
years). And, of course, there's Chocolate: The consuming passion.

Unfortunately, though, we don't have "But not the armadillo!" (at least
15 other titles, mind you, from a brief count on her publishers' pages).

(we do, however, as of a few weeks ago, have a tape of "Rhinoceros
Tap" - good stuff.)

.. [1] http://www.logilab.org/pypasax/index.html
.. [2] http://www.w3.org/TR/xml-infoset/ - heh, that's a new
   version just out...
.. [3] **Essential XML: Beyond Markup**
   by Don Box, John Lam, and Aaron Skonnard
   (Addison Wesley, ISBN: 0-201-70914-7)

Tibs

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
Dinosaurs looking right at YOU
to say GOODBYE because we're through.
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)