[Doc-SIG] Structured Text

Edward D. Loper edloper@gradient.cis.upenn.edu
Mon, 05 Mar 2001 20:46:41 EST


I've been going over the definitions of structured text (and its
various flavors), trying to see if I can formalize it even more than
Tibs did (http://homepage.ntlworld.com/tibsnjoan/STNG-format.html and
http://homepage.ntlworld.com/tibsnjoan/docutils/STpy.html)...  And a
number of questions came up.  I'm not sure if this is the correct
forum for such questions.. If not, I apologize, and would appreciate
it if you can tell me who I should be asking.  Anyway, my questions
were: 

1. Does every string value have an interpretation as a Structured
   Text?  That seems to be the case.  If so, is that a Good Thing?
   As an example of a string that we might not want to give a value,
   consider: 
   ||    indent level 0
   || 
   ||            indent level 1
   || 
   ||                    indent level 2
   || 
   ||        indent level ??

   I'd really prefer not to have cases like this have "undefined
   semantics."  It seems like we either need to specify what they
   mean, or say that they're illegal.

2. If it is true that every string value has an interpretation as a
   Structued Text, does it make sense to officially "discourage"
   certain types of strings, such as the example listed above?  It
   might also make sense to discourage strings like:
   ||    this
   ||      is
   ||  one messed up
   || paragraph

3. Which types of "code coloring" (emph, inline, etc.) can "wrap" over
   lines, and which can't?  E.g., can I have an *emph statement that
   continues to the next line?*

4. Is there any official precedance ordering on the different types of 
   "code coloring?"  Will there be anytime soon?  Any rules about what 
   types of code coloring can be contained in what other types?

5. Does structural formatting or code coloring take precedance?  For
   example, if a paragraph starts with "* foo *," will it be a normal
   paragraph with an emphasized first element, or a list item?  (It'll 
   be much easier for me to write formal rules if structure takes
   precedence. ;) )

6. Among the list types, which take precedence?  For example, if a
   paragraph starts with "1. foo -- bar", is it an ordered list item
   or a descriptive list item?

7. What is meant by saying that SGML text passes through?  SGML isn't
   even a mark-up language, so I assume that the intent is something
   like "XML and HTML text passes through."  But does that mean that
   in an expression like '<TAG>a*b*</TAG>', the '*'s will be ignored?
   That seems unreasonably difficult to implement.  What about an
   expression like '<T a="*x*"/>'?  Does this mean I can't say things
   like if 'x<y *and* y>z'?  Is there strong support for the
   notion of letting "SGML" text pass through, or is it something that
   might be dropped?  (I would certainly vote for dropping it. :) )

My eventual goal, to the extend that it's possible, is to write out a
complete formal specification for StructuredText using something
similar to BNF (Backus Naur Form).  (I'm pretty sure that vanilla BNF
is not powerful enough to capture StructuredText.)  After I've done
that, I'll start working on getting Emacs to colorize StructuredText
strings.  I'd also like to create a sort of test-suite set of strings
to test how different implementations function on different
"ambiguously defined" cases..

Any help and/or pointers are very much appreciated. :)

-Edward