[Doc-SIG] Re: DPS and DOM trees

Thu, 30 Aug 2001 00:31:48 -0400

Tony J Ibbs (Tibs) <tony@lsl.co.uk> wrote on 2001-08-29 04:21:
> I've now got to a stage with dps_visit.py where I'm starting to play
> with actually integrating DPS/reST into it.

Great!

> Yes, I know it isn't finished yet (!), but I want to flesh out what
> it's going to do before fining it down

That's the way the rest of the development is going too, so no
problem; please go ahead.

> (putting that another way, I want to see some HTML output from what
> I'm doing, and that means generating a DOM tree first)

Why is a DOM tree a prerequisite for HTML output? I don't follow.

> I would thus like to propose that the user provide a DOM document
> instance to the DOM generator methods

Is there a reason to provide a DOM *document instance* rather than a
constructor? The reason I ask is that in my implementation of
nodes.py, if you say ``document.asdom()`` you'll get a DOM tree rooted
at a DOM Document object, but if you say ``element.asdom()`` you'll
get a tree fragment, rooted at the DOM element itself, not at
Document. I think this will be useful when we get into tree
transformations.

Will something like this work? ::

    document = Parser().parse(data)
    domtree = document.asdom(your_favourite_DOM_implementation_here)

Currently nodes.py uses the Document, Element, and Text constructors
of minidom. The Document object's appendChild(), createTextNode(), and
createElement() methods are used. The Element object's setAttribute()
and appendChild() methods are used.

Are the DOM implementations standard enough for this kind of
interoperability? If so, then the work is done (see the CVS or
snapshot). That was easy. Calling document.asdom() without an argument
defaults to the xml.dom.minidom implementation.

If that won't do it, I'll leave it to you to figure out how it should
be done. As xml.dom.minidom is the only applicable DOM implementation
in the standard library, I haven't tried any others and have no
experience with them.

> Secondly, the current nodes.py assumes that the document produced
> will map to a single docstring - or at least so it look to me (after
> a short time reading it, admittedly). This is a disadvantage if one
> is trying to produce a DOM tree with (for instance) a Python module
> as the "top" of the document tree, and docstrings hanging off
> various points.
> 
> Luckily, if the user is required to provide the DOM document to the
> code, then this problem goes away - one just adds children to the
> relevent element in the tree.

The Python docstring mode model that's evolving in my mind goes
something like this:

1. Extract the docstring/namespace tree from the module(s) and/or
   package(s).

2. Run the parser on each docstring in turn, producing a forest of
   trees (internal data structure as per nodes.py).

3. Run various transformations on the individual docstring trees.
   Examples: resolving cross-references; resolving hyperlinks;
   footnote auto-numbering; first field list -> bibliographic
   elements.

4. Join the docstring trees together into a single tree, running more
   transformations (such as creating various sections like "Module
   Attributes", "Functions", "Classes", "Class Attributes", etc.; see
   the DPS spec/ppdi.dtd).

5. Pass the resulting unified tree to the output formatter.

I've had trouble reconciling the roles of input parser and output
formatter with the idea of "modes". Does the mode govern the
tranformation of the input, the output, or both? Perhaps the mode
should be split into two.

For example, say the source of our input is a Python module. Our
"input mode" should be "Python Docstring Mode". It discovers (from
``__docformat__``) that the input parser is "reStructuredText". If we
want HTML, we'll specify the "HTML" output formatter. But there's a
piece missing. What *kind* or *style* of HTML output do we want?
PyDoc-style, LibRefMan style, etc. (many people will want to specify
and control their own style). Is the output style specific to a
particular output format (XML, HTML, etc.)? Is the style specific to
the input mode? Or can/should they be independent?

I envision interaction between the input parser, an "input mode" (would
control steps 1, 2, & 3), a "transformation style" (would control step
4), and the output formatter. The same intermediate data format would
be used between each of these, gaining detail as it progresses.

This requires thought.

> would it be a Good Thing for me to work on my own version of
> nodes.py,

I don't see nodes.py taking on these responsibilities (the ones you
mentioned, not my pie-in-the-sky ramblings above). Instead, I see
these functions being done by a set of tree transforms (using what, I
don't know yet) and transformation modes (each comprised of a subset
of the aforementioned tree transforms).

> and integrate it back in later on (bearing in mind my CVS-lessness)?

If you can't use CVS, please use the daily snapshots and send me
patches.

> Thirdly, and separately, do we intend to support Python 1.5.2, or
> are we starting support with 2.0?

I've absorbed the new features of 2.0 too completely to want to revert
to 1.5.2. String methods and augmented assignment have become second
nature. Of course, it's possible to back-port to 1.5.2 (the new syntax
is all so much sugar, if sweet). If enough need arises, fine.

A larger issue might be that relating to the docstring extraction
mechanism. Won't the code that uses compiler.py be very
version-dependent? It would be very convenient if both the DPS and
PyChecker used a common module for module parsing; this module could
be maintained independently.

> Tibs (who narrowly avoided starting to write docstrings for existing
> code last night, instead of writing new code)

We need more of those too!

/David (who tends to write docstrings when confused by his own code)

-- 
David Goodger    goodger@users.sourceforge.net    Open-source projects:
 - Python Docstring Processing System: http://docstring.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net
 - The Go Tools Project: http://gotools.sourceforge.net