[Doc-SIG] Structured Text

Wed, 7 Mar 2001 10:22:42 -0000

Elsewhere David apologised for spelling my name Tibbs rather than Tibs -
that's OK, I'm used to it, given my surname (the reason it has a single
"b" is only that the original cat tag had "Tibs" and not "Tibbs").

David Goodger also wrested (wrost?) time from house moving recovery to
say:
> My approach is to think about it two-dimensionally, in X-Y space:
>
>     1. This is a list item::
>
>            This is some literal text (see the '::' above)
>
>        I can't make this paragraph a "child" of the list item.
>
> The "I" lines up with the "T" in the first line. A block
> diagram helps:
>
>     +----+------------------------------------+
>     | 1. | (list item)                        |
>     +----| +----------------------+           |
>          | | (paragraph)          |           |
>          | | This is a list item: |           |
>          | +----------------------+           |
>          |   +------------------------------+ |
>          |   | (code block)                 | |
>          |   | This is some literal text... | |
>          |   +------------------------------+ |
>          | +----------------------+           |
>          | | (paragraph)          |           |
>          | | I can't make this... |           |
>          | +----------------------+           |
>          +------------------------------------+
>
> The indentation of the code block is just for emphasis
> (unless you want paragraphs to contain body elements,
> a subject for intense debate ;-).

Hmm. I think of the indentation as Python-esque, and just "pretend" I'm
blind to anything but the first line (for non-literal paragraphs,
anyway). I'm afraid I've been exposed to document tree structures for
too long to want to think of it as block diagrams.

Anyway, I have very strong views on how correct markup should work (heh,
I have a TeX background, and I'm a pedant). HTML contravenes some of
them (sections in list items! - hah! "some" people may consider one
shouldn't have an H4 immediately following an H2 - hah!), and
StructuredText a whole different set. But that's *my* problem - ST is
also intensely pragmatic, and cleverly designed so that many of the
things people initially worry about never actually happen in practice
(given a close reading of the spec, even the "informal" STClassic spec).

Within the limitations of what one can do with a "nearly plain text"
approach that *is* trying to mix markup and presentation, it does
amazingly well (and I'm currently having a real problem with remembering
to type <em>...</em> when I'm writing HTML, instead of *...*(!)).

Unfortunately, of course, the block diagram is wrong, for all forms of
current ST - the correct diagram is::

          +-------------------------+
          | (list item)             |
          | 1. This is a list item: |
          +-------------------------+
               +------------------------------+
               | (literal block)              |
               | This is some literal text... |
               +------------------------------+
            +----------------------+
            | (paragraph)          |
            | I can't make this... |
            +----------------------+

since indentation starts at the start of a paragraph, which is
calculated *before* the list sequence number is removed.

What I would call the "traditional" branch of ST (STClassic, STNG - the
Zope strand) aims to separate paragraph generation from anything else -
it aggressively regards paragraphs as separated by blank lines, and the
document structure is built up only using paragraphs generated by such a
method (this, I think, is even more so in STNG, where they are trying
for a very "clean" structure of classes, with each phase of parsing
separated strongly from each other phase - the aim being to allow
subclassing for customisation.) Anyone reading *my* code will see I have
abandoned such an approach - basically because I wanted to allow list
items to start paragraphs (so they need to be detected early), and
(later on) because if one is going to handle literal "paragraphs"
properly, one needs to handle that specially as well - a simple "detect
paragraphs and then markup" won't do it, literalising of paragraphs
needs to be an intermediate stage.

Basically, in my view, processing ST-style texts well *requires* a
hybrid approach - I would assert that the results provided by a more
"theoretical" (in some sense) approach are not as satisfying. This is
probably, of course, a "religious" disagreement, and I will be
interested to see what Jim Fulton's people at Zope manage to do with
STNG (I have a feeling that they didn't throw away enough code before
starting the project, but that's a comment from fairly strong
ignorance).

Back to the example. As I was saying, in both branches of ST development
(NG and py), the *very start* of the paragraph determines its
indentation. *Because* this is such a fundamental structuring decision,
I would be very cautious about changing it (technically, in
STpy/docutils, it wouldn't be too hard, off the top of my head, to do
what you'd like). My experience, as I say further above, is that the
basic ideas of ST are *very* solid in their pragmatic usefulness, and I
would need to think about a lot of text examples before wanting to
change something like that.

Also, I am already worried a little about the incompatibilities between
STpy and STNG (since STNG is happy to be somewhat incompatible with
STClassic, I shall take the same stance on that). Luckily there aren't
many (extra features such as '#...#' will ultimately be selectable
anyway - this makes sense for processing .stx files that have nothing to
do with Python) - I think the *main* one may well be the alllowance of
list items starting new paragraphs, which was something STNG was also
considering at one stage. (Of course, if the incompatibilities are few
and abstruse, one *could* argue that STNG isn't taking such a wrong
approach, after all, couldn't one!)

> P.S. I haven't disappeared; just incredibly busy. I'm still
> lurking on the list.

And I still intend to comment on your documents at some stage.

Tibs

	(hmm - the exclamation count in this email is too high)
	(there, I've removed some - perhaps I should insert
	 some more parenthesised clauses to compensate...)

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
Give a pedant an inch and they'll take 25.4mm
(once they've established you're talking a post-1959 inch, of course)
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)