[Doc-SIG] DPS DTDs

Tony J Ibbs (Tibs) tony@lsl.co.uk
Thu, 13 Sep 2001 11:01:40 +0100


David Goodger wrote:
> Writing this intro after writing the bulk below, I think we may
> simply be looking at this stuff from different angles, seeing
> different silhouettes of the same thing.

I thought so before, and I'm fairly sure so now!


>  I see a fundamental difference between an object representing 'a
>  module' and an object representing 'a module's documentation'. The
>  trees of the different types of objects may resemble each other in
>  shape at first, but the nature of the nodes is very different.

Yes, definitely - that's a good paragraph.

>  The tree resulting from the analysis of Python source (the 'parse
>  tree') is specific to the 'Python source' input mode of the DPS,
>  and will not be seen outside of this context.

Definitely.

At which point, and having read the rest of the email, I *think* I see
what's going on...

Since I'm developing in small chunks of time, and also because I'm
designing as I go along (easy in Python), the tree structure that I
create using DPS nodes is being evolved over time. Because I'm not
always very elegant at naming, all of the elements I'm introducing are
being called "py_xxx" (that is, those are their tagnames - not the same
as their class names).

I haven't considered whether *some* of the "py_xxx" elements are
actually identical (or sufficiently close with modification on one side
or the other) to elements that the DPS nodes module already defines. To
an extent, I don't care - that (for me) would be hypergeneralisation *at
this stage*, particularly since I'm still prepared to readically change
the "outer" tree structure if necessary.

To an extent, I'd address the difference between parse tree and document
tree as that the parse tree allows one to reconstruct the Python code
(more or less!), and this is *not* a need of the documentation tree -
indeed, one may alter the order or structure of the latter to facilitate
documentation purposes and obscure things that the parse tree would
consider important.

Because the requirement is that the a DPS node tree be emitted so that
different Formatters (Writers) can use it, one *does* want a DPS node
tree (trivial point, but worth saying).

Because one is dealing with documenting Python, there is *likely* to be
Python-specific stuff in the "outer" parts of the tree (i.e., those bits
that are not inside docstrings).

The alternative would, of course, be to produce a DPS/reST document
*describing* the Python from scratch, and that's an alternative I hadn't
(directly) thought of - for instance::

    <section>
      <title>Python module Fred
      <section>
         <title>Globals
         <ordered_list>
            <item>fred

instead of::

    <py_module name="Fred">
       <py_globals>
          <py_global name="fred>

(the tagnames for both are wrong, but you get the point), but on the
whole I prefer the latter *at this stage* (it's easier to postprocess as
well, if one wants to (for instance) remove methods that start with an
underscore, and want to postpone that decision as late as possible - it
makes sense to me that this might be the sort of thing one wants to
customise in the document tree).

The structure above is what I'm talking about when I talk about
extending DTDs - it *isn't* the parse tree, although I guess it's close
in some ways.

> Please relax and enjoy this message, safe in the knowledge that it's
> just idle discussion. Please don't let me stop you from doing your
> thing in your own way. I'm sure it will be useful no matter how things
> end up.

!!!

I see what I'm doing as prototyping. It would be nice (very nice) if
elements of it (even large chunks of it!) end up in the final product,
but that's not the main point - the main point is to demonstrate that
one *can* do things (always more satisfactory than handwaving), and to
have a reference point to push against (for instance, "ugh, that's
horrible, I can improve on that" - a valuable response).

> If you represent docstrings this way, how will you distinguish real
> literal_blocks from unparsed raw docstrings?

Damn - I hadn't thought of that.

Actually, a simple answer would be::

    <py_docstring parsed="1">
       <literal_block>

but it's undoubtedly better to do::

    <py_docstring format="reST">
       <literal_block>

since we *have* the dosctring format "name" around (even if implicitly)
in the Python code. Actually, that last is probably an essential thing
to do.

> ppdi.dtd is not meant to extend the DPS nodes tree outwards into the
> Python code, but to provide specialized elements useful for
> *documenting* Python code. It's a subtle distinction but important
> IMO.

Actually, I think it shows exactly the point I've been misexplaining -
what I *want* to say is what you're suggesting I should, I think.

> Let's take a simple example::
...OK...
> The parse tree might end up looking like this (using indentation to
> show structure)::
...OK...
> This parse tree gets transformed into the following document tree
> (again using indentation, so we can omit many end-tags)::
>
>     <document>
>         <title>Module <module>example.py</>
>         <section>
>             <title>Module Attributes
>             <module_attribute_section>
>                 <module_attribute>a
..etc..

Not entirely dissimilar to what I'm actually doing, although details
differ quite a lot.

> None of the parse tree objects survive intact to the document tree.

No, I never wanted to suggest that. Much of the *information* does,
though!

..analogy snipped..

> (The tree above is just my preliminary idea of what the final DPS
> tree should look like for a Python module. For instance, the
> '<section><title>Module <module>xxx' could easily become
> '<module_section><module>xxx'. In the end, these specialized elements
> may disappear, leaving generic sections and titles in their wake.)

Which is sort of what I realised earlier in this reply,

> (Hmm. Since the .pformat() of DPS trees uses indentation also, we
> could omit the end-tags. Would shorten the test data considerably, and
> reduce confusion with XML, which is good. I like this. Implementing
> it... now.)

Indentation is good, end tags are verbose - I agree!

> Perhaps it's just a question of degree. I'm seeing the tree closer to
> the final generic document representation, you're seeing it closer to
> the original parse tree. Sound about right?

Sort of - I think I would say that, at the moment, I'm seeing value in
document elements that represent Python elements more directly (the same
sort of value as having a <booktitle> term (e.g., <booktitle>Jim</>) in
a document about libraries, instead of turning it into the
"standardised" representation 'JIM') - it's *useful* to be able to talk
about a Python module or class *in the document space*.

(ah - that's the insight/comparison I've been striving for - in the same
way that in TeX I prefer to define \book{title} rather than use (e.g.)
{\sc title}, even though in the final output they may *look* the same)

> I must explain that I'm seriously considering a fourth
> component, the 'style' for lack of a better term, that takes the
> output of the input mode and parser and transforms it into the final
> doc tree. The input mode and output style may require more than what
> dps.nodes provides. The output styles for an input mode may be so
> tightly coupled as to be specific to that input mode.

Hmm - so that sounds like the interface that changes my <py_module>
based tree into a "standardised" <section><title> tree - is that right?
(xslt for DPS nodes!)

> > And you had some, erm, interesting function definitions.
>
> Oh, I see what you mean, ones like this? ::
>
>     def standalone_uri(self, text, lineno,
>                        pattern=inline.patterns.uri,
>                        whole=inline.groups.uri.whole,
>                        email=inline.groups.uri.email):

Yep. Perfectly good Python code (if a bit confusing on first sight!),
but it showed me some representation I wasn't handling.

Tibs (trying to agree furiously)

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"How fleeting are all human passions compared with the massive
continuity of ducks." - Dorothy L. Sayers, "Gaudy Night"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)