[Doc-SIG] ST and DOM

Edward D. Loper edloper@gradient.cis.upenn.edu
Fri, 23 Mar 2001 10:27:05 EST


> I think that we should agree to agree on a DTD 

I'll agree, sort of..  One of the PEPs I'm writing has reduced
functionality, so its DTD will be a subset of the agreed-upon
DTD (in some sense, anyway)..

> Is this actually a separate PEP altogether? ("Doc-SIG - the PEP
> producer")

Hm.  I think you're getting a bit PEP-happy.  But I'll address that
issue later..

> > For now, I want to *only* consider global formatting.  We'll get to
> > local formatting (=colorising) later. :)
> 
> Reasonable. So we're defining "text blocks" and the structure above
> them.

Well, almost but not *quite*..  For example, I'd say that the following
is one text block::

    label: paragraph

But it's still got global formatting within it..  

> > There are 2 basic types of global formatting element: basic
> > elements (which are atomic, as far as global formatting goes);
> > and hierarchical elements (which are not).
> 
> OK - that's how I normally think too. But that distinction comes for
> free with using a DTD, really.

I don't see how it comes free..  You can choose to draw the lines
where you want..  (e.g., you were saying that anchors were local
formatting).  I used the following heuristic to divide things up:
    * Choose the smallest set of hierarchical elements such that:
        * paragraph is a basic element.
        * anything that can contain a basic element or a 
          hierarchical element is a hierarchical element.

> Agreed. Some additional elements are needed for callable object
> docstrings, though - informally, one also needs the "funcdesc"
> (apologies for the poor name) which is made up of a "signature" and an
> optional "summary-descripton" - for instance::
> 
> 	function(fred[,boolean]) -> integer -- This is silly.
> 
> or
> 
> 	function(fred[,boolean]) -> boolean
> 
> 	This is silly.

I disagree.  Isn't this the whole point of inspect?  To get that
information?  Why include it in the doc string?  That just seems
to make things very prone to errors.  What happens if the
signature doesn't match the real signature?  etc.

> >   * labelsection can only appear at top-level
> 
> Needs debating - I don't necessarily disagree, though.

I have trouble thinking of what it would mean for labelsections
to appear deap within a docstring.

> >   * anchorsection can only appear at top-level, and after all
> >     other elements of structuredtext.
> 
> I probably disagree. Probably.

I think that if we want anchors to be available anywhere
in a docstring, then we need to change them to be local 
markup, allow them *anywhere* that normal local markup is
allowed, and have them be invisible.  We would probably
also have to change the notation for them.  Then, if you
want to do an endnote, you just include an anchor at
the beginning of the footnote..  Something like::

    <anchor>'[foo]' Foo is a dummy word.

Where '<anchor>' is whatever syntax we decide to use for anchors.

I'm not saying this is a *good* thing to do, but I like
it better than allowing anchors, as they are currently defined,
to appear anywhere.  That just seems like a hack.  And I don't
think the meaning will be obvious to someone reading the
plaintext who's not familiar with ST (which it *should* be).

> >   * list items may not contain sections; but they can contain
> >     just about anything else (except top-level-only things).
> 
> I *do* agree (I too dislike sections in list items!)

The only potential problem I can see is people wanting to
use sections in DL items under label sections..  (e.g., 
when describing a parameter).  But I don't think we should
let them! :)

> Also to be reserved for future consideration: it seems natural to me to
> build a DOM tree that represents the whole module or package that is
> being dealt with, and "blat it out" in one go to the final format. This
> allows one to handle cross-referencing within a package (validate it,
> that is), rearrange the tree *as a whole*, and so on. So we will also
> want (optional) infrastructure *above* what you have defined.
> 
> I would propose that we have a toplevel node called something like
> "document" (heh, its traditional), and appropriate nodes allowed below
> that called "module", "function", "class" and "method", with other
> appropriate nodes and attributes for storing the useful information one
> might want to cache thereon.

I think we still need a "structuredtext" element (or something similar),
and a distinct "module" element.. the reason being that the
"structuredtext" element can contain labeled sections, but a module
shouldn't..  Instead, it should contain author sections and
version sections etc..  So I think we should have 2 separate *top
level* interfaces, which share a bunch of stuff: 
    * the "structuredtext" top-level element is produced when
      we parse any random ST string, without knowing what it 
      represents.
    * The docstring top-level elements, like "module" and "function"

The first would be produced by a parser; and the second by a 
docstring tool.

I took some of your comments into account, and came up with this
revised DTD.  The same caveats apply to this one that applied to 
the last one. :)

Basic blocks::

    <!ELEMENT paragraph ...>
    <!ELEMENT key ...>
    <!ELEMENT literalblock ...>
    <!ELEMENT doctestblock ...>
    <!ELEMENT label ...>
    <!ELEMENT anchor ...>

Hierarchical blocks::

    <!-- ** TOP-LEVEL ** -->
    <!ELEMENT structuredtext ((section | paragraph | %list; |
                               literalblock | doctestblock | 
                               labelsection)*, 
                              anchorsection*)>
    <!-- ** LISTS ** -->
    <!ENTITY % list "(ulist | olist | dlist)"

    <!ELEMENT ulist (ulistitem+)>
    <!ATTLIST ulist bullet (star | o | dash) "star">
    <!ELEMENT ulistitem (paragraph | %list; |
                         literalblock | doctestblock)*>

    <!ELEMENT olist (olistitem+)>
    <!ATTLIST olist bullet Nmtoken #REQUIRED>
    <!ELEMENT olistitem (paragraph | %list; |
                         literalblock | doctestblock)*>

    <!ELEMENT dlist (dlistitem+)>
    <!ELEMENT dlistitem (key, (paragraph | %list; |
                               literalblock | doctestblock)*)>

    <!-- ** SECTIONS ** -->
    <!ELEMENT section (heading, 
                       (section | paragraph | %list; |
                        literalblock | doctestblock)+)>
    <!ELEMENT anchorsection (anchor, 
                             (paragraph | %list; |
                              literalblock | doctestblock)*)>
    <!ELEMENT labelsection (label, 
                            (section | paragraph | %list;
                             literalblock | doctestblock)+)>

Docstrings::

    <!ELEMENT module (description, info?)>
    <!ELEMENT function (declaration, description, info?)>
    <!ELEMENT description ((section | paragraph | %list; |
                            literalblock | doctestblock)*,
                           anchorsection*)>
    <!ELEMENT info (authors?, version?, status?, ...)>
    ...

Note that the description element does *not* include
labelsection elements...  

I said that ordered list bullets are required.. is that
reasonable?  Should they be '#IMPLIED' instead?

-Edward