[Doc-SIG] ST and DOM

Tony J Ibbs (Tibs) tony@lsl.co.uk
Fri, 23 Mar 2001 14:21:22 -0000


Edward D. Loper wrote:
> So I was just looking through the XHTML DTD, and it doesn't really
> seem like what we want.

It is a bit odd, isn't it.

> But Tib's points about the DTD representation
> being important as a well-defined interface to ST are well-taken..

Good.

> Thus, I'd like to hash out some of the involved issues so I can
> put the appropriate stuff in my PEP. :)

I think that we should agree to agree on a DTD - that has advantage for
us in that we can both use the knowledge gained/shared, and it has
*definite* advantages for (a) people deciding which PEP they want (if
not both) and (b) tool users trying to take advantage of either/both of
our packages. We might even get STNG to agree...

Is this actually a separate PEP altogether? ("Doc-SIG - the PEP
producer")

> For now, I want to *only* consider global formatting.  We'll get to
> local formatting (=colorising) later. :)

Reasonable. So we're defining "text blocks" and the structure above
them.

(for those who don't know it, the major oddity of the XHTML DTD is that
it *doesn't* draw this distinction, so one gets the strange sort of
concept of:

	<structure element> contains:
          <markup element>
	    <#text node>
	    <markup element>
	    <structure element>
	    <structure element>

which is *distinctly* odd to someone trying to work with a non-XML
document, and is one (although not the major) reason why I made my
internal datastructure non-DOM).

> There are 2 basic types of global formatting element: basic
> elements (which are atomic, as far as global formatting goes);
> and hierarchical elements (which are not).

OK - that's how I normally think too. But that distinction comes for
free with using a DTD, really.

> I really think that the DOM tree should capture the *structure* of
> the formatted string..  To me, that means that it's weird to have
> elements like define a list item to be "a text block that *starts*
> a list item"...  Anyway, I propose that we use something similar to
> the following scheme:

Agreed. Some additional elements are needed for callable object
docstrings, though - informally, one also needs the "funcdesc"
(apologies for the poor name) which is made up of a "signature" and an
optional "summary-descripton" - for instance::

	function(fred[,boolean]) -> integer -- This is silly.

or

	function(fred[,boolean]) -> boolean

	This is silly.

(the two examples are identical in "meaning"). This is *important* for
docstrings, and should not be forgotten now if we are tailoring a
solution for such.

Maybe they should be "callable", "callable_signature",
"callable_summary" (or maybe one can elide the "callable" on the
sub-elements.

The following is probably wrong (and the names are too long!):

<!ELEMENT callable_info (callable_signature),(callable_summary)? >
<!ELEMENT callable_signature codeblock> <!-- but constrained to be one
line -->
<!ELEMENT callable_summary paragraph>   <!-- but ditto -->

> Basic units::
>
>     <!ELEMENT paragraph ...>
>     <!ELEMENT bullet ...>
>     <!ELEMENT literalblock ...>
>     <!ELEMENT doctestblock ...>
>     <!ELEMENT label ...>
>     <!ELEMENT anchor ...>
>
> Hierarchical units::
>
>     <!ELEMENT structuredtext ((section | paragraph | list |
>                                literalblock | doctestblock |
>                                labelsection)*,
>                               anchorsection*)>
>     <!ELEMENT section (heading,
>                        (section | paragraph | list |
>                         literalblock | doctestblock)+)>
>     <!ELEMENT list (listitem+)>
>     <!ELEMENT listitem (bullet,
>                         (paragraph | list |
>                          literalblock | doctestblock)*)>
>     <!ELEMENT anchorsection (anchor,
>                              (paragraph | list |
>                               literalblock | doctestblock)*)>
>     <!ELEMENT labelsection (label,
>                             (section | paragraph | list
>                              literalblock | doctestblock)+)>
>
> Some notes on this scheme..  Some of these might end up getting
> changed..
>   * labelsection can only appear at top-level

Needs debating - I don't necessarily disagree, though.

>   * anchorsection can only appear at top-level, and after all
>     other elements of structuredtext.

I probably disagree. Probably.

>   * list items may not contain sections; but they can contain
>     just about anything else (except top-level-only things).

I *do* agree (I too dislike sections in list items!)

>   * anchor sections may not contain sections; but they can
>     contain just about anything else (except top-level-only
>     things).

Makes sense.

>   * labelsections can contain anything except top-level-only
>     things.  However, particular labels may place further
>     restrictions on their contents..

Agreed.

I would personally prefer to lose "bullet" as such, and retain only
"key" or "description" for descriptive lists. I do not wish the renderer
to take the bullet (or number sequence) as anything other than a hint,
and thus I think it should be an attribute, not an element...

Also to be reserved for future consideration: it seems natural to me to
build a DOM tree that represents the whole module or package that is
being dealt with, and "blat it out" in one go to the final format. This
allows one to handle cross-referencing within a package (validate it,
that is), rearrange the tree *as a whole*, and so on. So we will also
want (optional) infrastructure *above* what you have defined.

I would propose that we have a toplevel node called something like
"document" (heh, its traditional), and appropriate nodes allowed below
that called "module", "function", "class" and "method", with other
appropriate nodes and attributes for storing the useful information one
might want to cache thereon.

This is how docutils currently works (well, more or less).

But as I said, for future consideration.

> Now, this is not meant to be a final DTD..  For example, it might
> make sense to split list, listitem, and bullet into 3: dlist, olist,
> ulist, etc..  But does this *overall* structure seem reasonable?

I think it probably does make such sense (I'd prefer it that way). But I
agree, it's a good start.

Do we have anyone around, listening, who actually knows how one is
*meant* to design a *good* DTD (i.e., I'm sure we can come up with
something workable, but are there conventions, known boobytraps, etc.,
that we can be helped with to get something really good?)

> For comparison, Tibs has a DTD at the bottom of
> <http://homepage.ntlworld.com/tibsnjoan/docutils/STpy.html>,
> although I'm not sure if it's up-to-date.  It seems to go against
> some of the things he's been saying on doc-sig lately.. (??).

It's very old, it was very preliminary, and it's just plain wrong. So
ignore it.

(main task this weekend: rewrite STpy.html
possibly to be preempted by all the "real life" things I also have to
do...)

Tibs
--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
Which is safer, driving or cycling?
Cycling - it's harder to kill people with a bike...
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)