[Doc-SIG] formalizing StructuredText

Edward D. Loper edloper@gradient.cis.upenn.edu
Thu, 15 Mar 2001 16:37:46 EST


I've been working on expanding the domain of STminus (a formalized
version of StructuredText, expressed in an EBNF variant).. And
the following questions came up.  (Some of them may not make much
sense if you're not familiar with StructuredText.)  These are
generally not questions that have "correct" answers, so I'm 
wondering what people think I should make STminus do.  (Of course
I'm interested in what STpy and STNG have to say about these
things too).

    * Are list items required to have contents?  I.e., can a list 
      item be just a bullet?  This only makes sense to me if you
      used it in an environment like::

          1. 

               text...

          2.

               text...

    * Apostrophes can appear in the middle of a word or at the end
      of a word, like "isn't" and "dogs'".  Is it illegal to have
      multiple apostrophes in the same word?  There are no English
      words that use multiple apostrophes, but I'm not sure about
      other languages (although there are probably some languages
      that have words with apostrophes at the beginning of a word,
      ("'til"?) and StructuredText clearly won't deal with those..)

    * When parsing various structures, like paragraphs and list
      items and bold items, what whitespace is kept?  E.g., if I
      were to export to XML, would the trailing whitespace on
      paragraphs be included?  Or the whitespace between a
      description list key and the hyphen?

    * Can #inline# expressions contain newlines?  I assume not
      ('literal' expressions can't.)

    * What are valid expressions for starting an ordered list item?
      Currently STNG uses "([a-zA-Z]+\.)|([0-9]+\.)|([0-9]+\s+)"
      i.e., a series of letters followed by a dot, a series of
      numbers followed by a dot, or a number followed by space.
      This seems wrong to me, because it implies that the following
      are ordered list items::

          Hi.  This is a list item.

          12 is a fun number.

      And it does not allow for expressions like:

          1.2. This is a list item.

      Also, note that since in STpy variants (which will include
      my proposed markup for formatted docstrings), list items can
      begin without an intervening space.. So we would get::

          The first line is a paragraph but the second line is a list
          item.  (Since it starts with letters followed by a dot)

      Even if we restrict ourselves to Roman numerals, we have 
      problems::

          Hopefully someone who can figure this out who is smarter than
          I.  But I don't see a way to use roman numerals safely..

      So maybe we could just use "([0-9]+\.)+"?

    * What restrictions are there on hfrefs ("name"://http:some.url)
      According to STNG, they can use relative URLs ("name":whatever).
      These end up being pretty tricky to formalize..

        * Can href names span multiple lines?
        * Can href names contain coloring? (I'd like to say no)
        * Should the string '":' only be allowed for hrefs?
          Or maybe '":(?!\s)', so you can say "this": that?
        * What do you do with things like::

            This *is "too* confusing":http://some.url

          (Keeping in mind that things like this should be ok)::

            Normally *quotes " don't have* any special meaning,"
            so they don't have to nest properly..

Well, that's all for now.  I'll post more issues as they come up. :)

-Edward