[Doc-SIG] Formalizing StructuredText (yeh!)

Edward D. Loper edloper@gradient.cis.upenn.edu
Mon, 12 Mar 2001 14:36:23 EST


> BUT I think that it would be better to strategically insert references
> to STminus at "the appropriate places" in the Zwiki (mind you, I've
> still to figure out how the people using Wikis target new stuff that
> they should be looking at).

I added a page to Zope (under CurrentIssues), and I'll try to actually
put more content there when I can. :)

> 
> > I have a suscicion that STminus's current definition does *not*
> > actually provide a subset of the intersection of STNG and STpy..
> 
> I think it's very close, actually.

I guess it depends on how much the implementations diverge from
the intensions..  At least for STNG, there are a number of current
differences.  I've been writing a large test set, and plan to 
post a link to it, and to the results of running STminus on it,
later today.. (still needs a little more work).  At that point, I'm
hoping we can get a better idea of whether STNG and STpy really
act like STminus.  (My guess is that most differences are unintentional
ones)

> > (although, just because one parses something one way,
> > doesn't mean that that's the intended behavior, but..)
> 
> Well...
> 
> In the case of STminus, I think we could say that it does (!)

Definitely, since STminus is implemented directly from its formal
definition. 

> In the case of STpy, if it does something surprising then either that's
> a bug, or an unforeseen consequence of something that *isn't* a bug, in
> which case it either needs designing around or explaining.
> I doubt that STNG is too much different (although I suspect they prefer
> the "explain around" to the "change" mechanism - stability (of some
> sort) over perceived complexity).

I'm hoping that STNG will be willing to make at least a few changes..
For example, changing 'x*y' and 'y*z' to be 2 literals rather than
one emph area.

> (right - so STpy/docutils is going to be the Common Lisp of ST, STminus
> is clearly Scheme, so what is STNG? I feel it is likely to have more
> "unexpected results" because of a wish for a faster engine - Tcl? no,
> that's unfair)

Well, I'm hoping that all 3 at least have well-defined results *where
they do define the results*.  I certainly hope that STNG will only
give "unexpected results" for a small subclass of strings.. :)

> One comment - when defining *StructuredText*, you say "all paragraphs
> and list items be separated by at least two blank lines" - shouldn't
> that be "one blank line" which is "two newlines".

The text was wrong, the production was (I think) correct.  It 
should have said that paragraphs are separated by "at least one
blank line."  I'll fix it..

> I tend to think of the STNG document structure as being:
> 
> BlankLine = S* NL
> TextLine  = <<informally, a line with text in it>> NL
> Paragraph = TextLine+
> StructuredTextNG = BlankLine* Paragraph
>                    (BlankLine+ Paragraph)*
>                    BlankLine*

Almost..  But since I define paragraphs *not* to include their
trailing newline, you need at least two *newlines* between paragraphs.
Also, the way you wrote it, there's an ambiguity as to whether
trailing spaces belong in the paragraph and the blank line.. (I
wrote my implementation of ebnfla so it checks for all possible
ambiguities, so this type of thing is easier to detect when you're
actually playing with rules).  Finally, your rule doesn't allow
for strings that contain a final blank line that isn't terminated
by a newline.  I'll try to give a better explanation of the 
rule in http://www.cis.upenn.edu/~edloper/pydoc/stminus-001.html
when I get a chance.

> This assumes that the empty document (only consisting of zero or more
> blank lines) isn't allowed, which makes the production simpler - I
> suppose that first "Blankline* Paragraph" could become a "?" group (0 or
> 1 occurrences) if needs be...

I assumed that the empty document was acceptable..  Unless there's a
reason to make in unacceptable.

> It's that first paragraph that's the problem - if it had to have a
> starting blank line life would be a lot simpler (indeed, I think STNG
> solves this problem by pretending it does!).

If we had a starting break, we would still need::

  StructuredText = (S* NL)*
                   (NL (S* NL)+ Paragraph | NL (S* NL)+ ListItem)* 
                   (NL S*)*

instead of::

  StructuredText = (S* NL)* (Paragraph | ListItem)?
                   (NL (S* NL)+ Paragraph | NL (S* NL)+ ListItem)* 
                   (NL S*)*

> Surely for the common-denominator, you don't need to separate out list
> items from paragraphs in this production?

I separated out list items from paragraphs to make it easier to
replace the rule for STpy.  We could very easily say instead::

  StructuredText = (S* NL)* Entity?
                   (NL (S* NL)+Entity)* 
                   (NL S*)*
  Entity = Paragraph | ListItem

If you think that's easier to read.  It defines the same language, so
it doesn't really matter to me. :)

> Or, if you do:
> 
> Paragraph = ( ListItem | TextLine )
>             TextLine*

I tried to make all of my productions correspond to their actual 
entities..  So you shouldn't need to do (much) postprocessing on
the output of STminus.  For example, the Paragraph production 
should give an entire paragraph, not just its first line.  I think
I may add ULItem, OLItem, and DLItem productions for similar 
reasons (without changing the language defined by the productions)

> Hmm. Anyway, I'm still very impressed - keep up the good work!

Thanks!  You've done some impressive work, yourself.