[Doc-SIG] formalizing StructuredText

Edward D. Loper edloper@gradient.cis.upenn.edu
Wed, 21 Mar 2001 12:02:02 EST


> 1. Newlines are preserved again in non-literal paragraphs (Edward Loper
> convinced me that the benefits outweighed the problems).
> 2. Newlines are not allowed within literal and Python literal strings.

Yay!  I'll code that up in STminus002 as soon as I get a chance.
(I should be done with STminus002 relatively soon).

> 3. Local references (which look like '[this]' or '[1]') are now
> supported. The "anchor" for a local reference must be at the start of a
> paragraph (in future releases I would expect it to *start* a new
> paragraph if at the start of a line), and looks like::
> 
> 	..[this]

So... are anchors always hrefs?  Or can they be generic footnotes?  Or
references for a references section?  How should we deal with these
when we're using something other than HTML (e.g., LaTeX) to render
the string?  If anchors can be footnotes or references, how does the
renderer decide what to do with them?

> 4. List items and local references may be "empty" paragraphs, but there
> may still be some unresolved issues with respect to newlines - I'm not
> sure that::
> 
> 	1.
> 	  Some text
> 
> is allowed (it probably should be, if the form with a blank line between
> those two lines *is* allowed).

I'll add this too.  BTW, how are you currently handling things like this::

  1. some text

     some more text

Is that a list item with 2 paragraphs, or a list item with some contents
and 1 subparagraph, etc?  I.e., how would it get rendered in whatever
XML-like thing you're using?

> 5. The RE used for detecting URLs has become more sophisticated. There
> are some associated rules 

Hm.. I don't look forward to formalizing this, and trying to get STNG
to agree with your regexps :)

> That approach is what I meant when I talked about "a long RE for
> detecting common errors", and it is a sensible approach *if one is
> validating* - but the results should be warnings, 'cos one of the points
> of ST, originally, is that users should be able to "push the corners" a
> bit.

Or errors, if the user asks for them to be errors. :)

Note also that it should be possible to generate the "long RE 
expression" in a *principled* way, given a formalization, so that 
it will detect *all* errors, not just *common* errors.

> > But from the point of view of someone formalizing the language, saying
> > "there's an ambiguity" is no good.  I have to either explicitly say
> > "it's illegal" (=undefined) or "xyz is the correct answer."
> 
> Oh, I agree, and it's a good thing to do. But you *do* have a third
> option, which is the "this behaviour produces undefined results", which
> is not *quite* the same as "illegal".

Ok, in the formalization system I set up, I divided everything into
"valid" and "undefined".  I see a good argument for further dividing
"undefined," though..  So I'll redefine my terms, as such:

  valid   -- The string has a unique, predictable result.  this is the
             same result that it will have in all future versions.
  invalid -- The string does not have a unique, predictable result
      illegal   -- The string will never have a unique, predictable result
      undefined -- The string does not currently have a unique,
                   predictable result, but it may in a future version.

Is that acceptable terminology?  (I'll try to remember to stick to
it)

-Edward