[Doc-SIG] Re: Adding <group>, losing <package_section> and their ilk

David Goodger goodger@users.sourceforge.net
Mon, 19 Nov 2001 23:24:05 -0500


(I've dropped the washing machine reference from the title because
it's TOO SILLY!)

Tony J Ibbs (Tibs) wrote:
> Whilst David's diagram is not *quite* how I see the process, it's
> close enough for this purpose. Thus pydps [#pydps]_ might be shown
> as::
>
>   +--------+     +--------------+     +------------+     +---------+
>   | READER | --> | transform.py | --> | transforms | --> | html.py |
>   +--------+     +--------------+     +------------+     +---------+
>       |                                                      |
>   +----------+                                     +---------------+
>   | visit.py |                                     | buildhtml.py  |
>   +----------+                                     +---------------+
>
> The "READER" is implicit in the main utility (currently
> ``pydps.py``), and locates the relevant Python files. It then
> uses ``visit.py`` to generate a tree representing the Python code

I'd consider ``visit.py`` to be part of the Reader.

> So, in summary:
>
>    1. ``transform.py`` generates a *normal* DPS tree. It doesn't
>       use any "odd" nodes (except <group> - but we'll discuss
>       that later on). This means that it should be possible to
>       plug in any other writer, and produce a different format as
>       output - a very significant advantage.

Substitute "The Reader" for "transform.py" and: Yes. In the revised
model, the Reader subsumes the transforms.

>    2. ``html.py`` expects to be given a *normal* DPS tree. This
>       means that it should be usable by any other utility that
>       also provides a normal DPS tree - again an advantage.

Ditto "The Writer" for "html.py": Yes.

> The problem
> -----------
> But there is a clash in requirements here. Whilst it is very nice
> to be able to represent the Python information as "normal" DPS
> (guaranteeing that anyone can do useful things with it), there is
> some need to transfer information that goes beyond that. There
> are two main reasons for wanting to do this:
>
>   * Data mining
>   * Presentation
>
> For the first, although DPS/reST/pydps is primarily about
> producing human-viewable documentation, it might also be nice to
> be able to extract parts of it automatically, for various
> purposes - for instance, retrieve just the information about
> which classes inherit from other. This information will, in part,
> be obvious from the text chosen within the document (a title like
> "Class Fred" might be taken to be useful, for instance!), but it
> would be nice to give a bit more help.

Wouldn't that be better left to a tool accessing the information
directly? (Correct me if I'm wrong...) Your visit.py extracts
structural information from the AST via compiler.py, and transforms it
into the trunk and major branches of the doc tree, onto which the
parsed docstrings get attached like minor branches and leaves. The doc
tree is one step removed from the AST; I would think that a data
mining application would want to start as close to the source AST data
as possible.

I wouldn't want to try to extract such structural information from
textual clues. There are certain limits on such operations (as we all
well know).

> For the second, it's relatively difficult to produce better
> layout for DPS Python documentation without more information to
> work on. If one uses the (rather garish) default presentation
> produced by pydps (and no, I'm not saying that's a *nice*
> presentation, but it is the one I've got as an example), it is
> clearly useful to be able to:
>
>   1. group together the package/class/method/etc title and its
>      full name/path/whatever
>
>   2. group together a method or function's signature and its
>      docstring

I would say that's a job for the Reader and its tranforms, not for the
Writer.

Let me elaborate a bit more on the latest diagram::

           1,3,5                               6,8
           +--------+                          +--------+
           | READER | =======================> | WRITER |
           +--------+ (purely presentational)  +--------+
            //    \                              /    \
           //      \                            /      \
    2     //     4  \               7          /     9  \
    +--------+   +------------+     +------------+   +---------------+
    | PARSER |...| reader     |     | writer     |...| deployment    |
    +--------+   | transforms |     | transforms |   |               |
                 |            |     |            |   | - one file    |
                 | - docinfo  |     | - styling  |   | - many files  |
                 | - titles   |     | - writer-  |   | - object data |
                 | - linking  |     |   specific |   |   structure   |
                 | - lookups  |     | - etc.     |   +---------------+
                 | - reader-  |     +------------+
                 |   specific |
                 | - parser-  |
                 |   specific |
                 | - layout   |
                 | - etc.     |
                 +------------+

I've added double-width lines between reader & parser and between
reader & writer, meaning that data sent along these paths should be
standard (pure & unextended) DPS doc trees. Single-width lines signify
that internal tree extensions are OK (but must be supported internally
at both ends), and may in fact be totally unrelated to the DPS doc
tree structure. I've added "reader-specific" and "layout" transforms
to the list of transforms. BTW, these transforms are not necessarily
all in one directory; it's a nebulous grouping (it's hard to draw
ASCII clouds).

I've also added numbers to show the path a document would take through
the code.

> David's original approach to this was to introduce a host of
> Python specific tags into ``nodes.py`` [#nodes]_ - for instance::
>
>     package_section
>     module_section
>     ...
>     instance_attribute_section
>     ...
>     parameter_item
>     parameter_tuple
>     ...
>     package
>     module
>     class_attribute
>
> There are several problems with approach. Perhaps the most
> serious is that *all* generic DPS writers need to understand
> this host of elements that are only relevant to Python. Clearly,
> someone writing a writer for other purposes may be reluctant to
> go to this (to them) redundant effort.

I think the Python-specific extensions should be removed from
dps.nodes and relocated to the PySource Reader for Reader-internal use
only. The DTDs have been split since the beginning: gpdi.dtd for
generic elements, and pdpi.dtd for Python-specific stuff.

Writer modules should support only the standard doc tree elements, and
should be considered "presentation only". I'm sure there are lots of
changes to the doc tree structure that can be made for the benefit of
Writers. The only reason there aren't any presentation-oriented
attributes on elements is that nobody's added them yet.

> From my point of view, an immediate problem is that the set of
> elements is not *quite* what I want - which means working towards
> a set of patches for ``nodes.py`` and the relevant DTD, and
> getting David to agree to them (well, that last is a useful
> process to have in place, but still). Since I'm not likely to get
> it right immediately, this is a repetitive process.

If we extract the Python-specific stuff out of dps.nodes, you'll have
a free hand to do whatever you like within the PyDPS/PySource Reader.

> Lastly, one might imagine someone from another programming
> language domain adopting DPS/reST. One can expect them to be
> frustrated if the set of constructs provided for Python doesn't
> quite match the set of constructs required to describe their
> language in a natural manner.

So they write the LanguageXSource Reader with their own
language-specific extensions. They'd have to anyway since compiler.py
won't help them.

> Groups and "style"
> ------------------
> The first thing that I realised was that, for convenience of
> output at least, I wanted to be able to group elements together -
> or, in terms of the DPS tree, insert an arbitrary "subroot"
> within the tree to 'abstract' a branch of the tree.
>
> This is particularly useful for linking together the parts of the
> text that deal with (for instance) attribution, or unusual
> globals, without having to embed yet another section.

That was the purpose of the Python-specific elements in ppdi.dtd. But
they were just my first guess at what would be needed; feel free to
modify as necessary.

> Once one has a <group> element, it is natural to annotate it with
> *what* it is a group of/for. I chose the arbitrary term "style" -
> partly because it is not used in HTML/XML for any purpose I am
> aware of.
>
> And once one has the "style" attribute, it becomes possible to
> use it elsewhere - most notably in <section> elements, saving the
> need for a myriad of different sections for different purposes.

I can see using a "style" attribute to communicate formatting
information between the Reader and the Writer (which would be free to
ignore the advice if not understood).

I'd much rather have a bunch of different section-level classes than
one class with a "style" attribute. With section-level classes, we can
use polymorphism to advantage.

Plus, the classes are significantly different. Here's one::

    <!ELEMENT package_section (package, %structure.model;)>

The "package" element is just a replacement for a generic section's
"title". It is easier to hang boilerplate text or formatting onto a
specific element though.

But here's another::

    <!ELEMENT class_section
        (class, inheritance_list?, parameter_list?, %structure.model;)>

The "class" element contains the class name; it's the section title.
But then there's an inheritance list and parameter list; much more
interesting. If you had a generic "group" element, *and* you wanted an
"inheritance_list" sometimes, you'd have to allow it anywhere, even
when it's not applicable. You'd either end up with a freeform doc tree
structure or one that's impossible to validate.

> Summary
> -------
> Current DPS defines many element types for use in Python code
> representation.

Gut 'em.

> However, there are major advantages in only using the "simple"
> DPS nodes for all purposes.

True.

> This becomes simple and practical given a single extra, general
> purpose, element: <group>.

Nah. (For the same reason we don't use DOM: too general.)

> Furthermore, adding a standard attribute called "style" (or
> perhaps "role" - see [#style]_) seems to fulfil any other
> outstanding requirements.

Could be...

-- 
David Goodger    goodger@users.sourceforge.net    Open-source projects:
 - Python Docstring Processing System: http://docstring.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net
 - The Go Tools Project: http://gotools.sourceforge.net