[Doc-SIG] formalizing StructuredText

Tony J Ibbs (Tibs) tony@lsl.co.uk
Thu, 22 Mar 2001 10:10:11 -0000


Edward D. Loper wrote:
> (I should be done with STminus002 relatively soon).

Good. As you said to Ka-Ping Yee elsewhere, a simple and a complex
choice for ST variants is a good choice to have (although I would add,
of course I would, that the complexity is either inherited from
STClassic, or asked for in the past iterations of this SIG).

> > 3. Local references (which look like '[this]' or '[1]') are now
> > supported. The "anchor" for a local reference must be at
> the start of a
> > paragraph (in future releases I would expect it to *start* a new
> > paragraph if at the start of a line), and looks like::
> >
> > 	..[this]
>
> So... are anchors always hrefs?  Or can they be generic footnotes?  Or
> references for a references section?  How should we deal with these
> when we're using something other than HTML (e.g., LaTeX) to render
> the string?  If anchors can be footnotes or references, how does the
> renderer decide what to do with them?

Erm - no, in HTML terms, anchors are names. The obvious HTML translation
of the DOM tree for a local reference and anchor is::

	Some text containing a local reference to
      <a href="#this">[this]</a>.

	<a name="this">[this]</a> is the anchor.

In the DOM tree, I have to decide what to put into the "reference", and
at the moment I follow HTML/XML conventions and store what you see -
that is, the reference element has an attribute whose content is the
string '#this'. I use the same attribute name as I use for other links.
The advantage of this is twofold - it means we have only one way of
linking within the document (which will map easily to both HTML and to
XLinks, although we are only using the simplest subset of XLinks!), and
it means a user can regard::

	[This] is a local reference

and::

	"This":#this is a local reference

as the same, which isn't much use *within* a document, but is *very*
useful for allowing links from outside.

As to using HTML/XML type links - well, we already had to choose URLs
for our external links (or think of it as using simple XLinks if that
makes you happier) - this makes consistent sense if we are using a DOM
tree to represent our document, anyway. It makes sense to continue this
for local references. A tool like TeX would need some untangling of the
'#this' to just 'this' for use in its '\xref', but that's hardly
difficult.

> I'll add this too.  BTW, how are you currently handling
> things like this::
>
>   1. some text
>
>      some more text

The list item is at indentation N, the next paragraph at indentation
N+3, so that is a list item paragraph and its first child. The
"flattening" phase will note that the first item is a list item and the
second a paragraph (tags "oitem" and "para"), and bring the paragraph up
to be a sibling of the list item.

In summary, the initial internal structure is::

	<oitem>
	    <para>

and this gets "flattened" to be::

	<oitem>
	<para>

which then gets translated into the DOM tree as elements with those tags
(both will, of course, be children of a surrounding '<olist>' element).

(if we had::

	This is a paragraph.

	   And so is this.

then the flattening phase would say to itself "aha - a paragraph within
a paragraph - presumably the user *meant* something by that", and in
this case it would produce::

	<para>
	<block>
	   <para>

(clearly we don't regard a paragraph inside a paragraph as being very
meaningful in any real sense, but it seems a pity to waste the
indentation that the user put in so carefully, and this is the obvious
meaning to take). In an HTML rendering, I would expect 'block' to become
'blockquote'.)

> > 5. The RE used for detecting URLs has become more
> > sophisticated. There are some associated rules
>
> Hm.. I don't look forward to formalizing this, and trying to get STNG
> to agree with your regexps :)

STNG has its own REs. They don't make much sense to me (or didn't last
time I looked at them). In some cases, they just didn't work very well.
Oh well.

But I don't see why *formalising* it is a problem?

> Note also that it should be possible to generate the "long RE
> expression" in a *principled* way, given a formalization, so that
> it will detect *all* errors, not just *common* errors.

This I don't understand - I'm not sure what you mean by "in a principled
way", and I'm also not sure what you mean by "all errors, not just
common errors".
But this will doubtless become clearer to me as STminus progresses (I
begin to suspect you may regret that name some day, as it becomes more
capable and more clearly sufficient-to-itself).

> Ok, in the formalization system I set up, I divided everything into
> "valid" and "undefined".  I see a good argument for further dividing
> "undefined," though..  So I'll redefine my terms, as such:
>
>   valid   -- The string has a unique, predictable result.  this is the
>              same result that it will have in all future versions.
>   invalid -- The string does not have a unique, predictable result
>   illegal   -- The string will never have a unique,
>                predictable result
>       undefined -- The string does not currently have a unique,
>                    predictable result, but it may in a future version.
>
> Is that acceptable terminology?  (I'll try to remember to stick to
> it)

I'm not sure I'd bother to separate the middle two ("never" is a big
concept, and four is somehow more uncomfortable with three), but
otherwise I'd be happy to go with those...

Tibs

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
Give a pedant an inch and they'll take 25.4mm
(once they've established you're talking a post-1959 inch, of course)
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)