[Doc-SIG] Re: DPS and DOM trees

Thu, 30 Aug 2001 22:30:16 -0400

Tony J Ibbs (Tibs) <tony@lsl.co.uk> wrote on 2001-08-30 05:12:
> David Goodger wrote:
> > Why is a DOM tree a prerequisite for HTML output? I don't follow.
> 
> Hmm - I hadn't thought about it much.
> 
> Firstly, I want to be able to take advantage of the parsers other
> people are writing for the DPS project (I only need to provide the
> extra bits for the Python specific structures outwith the
> docstring), and as I understand it, those work off the DOM tree?

Actually, after trying to use the DOM (xml.dom.minidom) directly, I
saw what you & Garth & Guido meant by its inconvenience and coded up
the nodes.py class library instead. That let me make the classes
convenient to use, with polymorphism and other OO goodies.

I added DOM output (.asdom()) as an option, not because it was needed
for anything, but (violating XP, I know) because it was easy and
served as a proof of concept.

Nothing works off an actual DOM tree directly. The parser creates a
nodes.py tree, and unless necessity arises I don't see why the rest of
the DPS (output formatters included) shouldn't use the same as their
interface data structure.

> I'm still at a *very* early stage in navigating around the source
> code.

Not helped much by the paucity of documentation; apologies.

> Where is the actual structure of the document *as Python
> datastructure* defined?

PEP 258, "Intermediate Data Structure":

    A single intermediate data structure is used by the docstring
    processing system, in the interfaces between parsers, the DPS
    itself, and formatters.  It is not required that this data
    structure be used internally by any of the componentes.  This data
    structure is similar to a DOM tree whose schema is documented in
    an XML DTD ...

It started out being "a DOM tree", now it's "similar to a DOM tree".
Easy to miss.

> Is the stuff in nodes.py the class structure for the Python tree
> structure

Yes.

I have some updating to do on the PEPs. Will specify the nodes.py
implementation.

> Somehow I can't help feeling that the infoset is a natural for my
> thinking of this "document" we're working with...

Which "infoset"?

> > > I would thus like to propose that the user provide a DOM document
> > > instance to the DOM generator methods
> >
> > Is there a reason to provide a DOM *document instance* rather than a
> > constructor?
> 
> Hmm - I thought that in DOM the document instance (or perhaps class)
> *was* the constructor - but I (think I) see what you mean.

If the doc object is the constructor, we can change the interface to
.asdom() to accept such an object, or default to an
xml.dom.minidom.Document. That doesn't help with tree fragments
though.

> Interestingly, I see directives as being split similarly - for a
> typical (non-pragma) directive, one needs both code to parse the
> directive content (or to manipulate what DPS/reST has already
> parsed), and code to format it within a formatter. Clearly,
> formatters should cope well with directives they *don't* understand,
> though!

I think most directives won't survive (as directives) past the parsing
stage (they can't, since they're a *parser construct*). Instead,
they'll transform their contents into elements on the doc tree. Only
unknown directives will survive to any later stage, and be formatted
in a generic "directive" way, perhaps as a system warning.

Now, the newly transformed directive *contents* may be specialized
elements which require further processing later on, but they're no
longer raw directives.

> This is, of course, another reason to consider the DOM as a useful
> tool - people can be expected to learn about the DOM independently
> of us, and to understand (in general) how to manipulate DOM trees.
> There are standard ways of documenting the content of a DOM tree
> (DTD, XMLSchema, etc.). There are Useful Tools.
> 
> Asking each person who wants to write a thingy to understand *our*
> tree structure in depth may be more powerful in some ways, but more
> onerous in others.

Anybody who doesn't want to use the DPS structure can use .asdom() to
convert. If the need arises, we (they) can create a DOM->DPS tree
converter too.

> > It would be very convenient if both the DPS and
> > PyChecker used a common module for module parsing;
> > this module could be maintained independently.
> 
> Hmm. It should be doable - its just there would be a lot more
> information extracted than we need.

It's easier to prune a tree than to grow one! But until an alternative
exists, we must grow our own.

> I may change the name of "dom.py" - indeed, I may change all sorts
> of things!

Change is to be expected and embraced.

> --
> Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
> "Bounce with the bunny. Strut with the duck.
> Spin with the chickens now - CLUCK CLUCK CLUCK!"
> BARNYARD DANCE! by Sandra Boynton

We obviously have a parent of (recent, if not current) preschoolers.

I prefer, "But not the armadillo!" That's *funny*! (First time ;-)

-- 
David Goodger    goodger@users.sourceforge.net    Open-source projects:
 - Python Docstring Processing System: http://docstring.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net
 - The Go Tools Project: http://gotools.sourceforge.net