[Doc-SIG] formalizing StructuredText

Edward D. Loper edloper@gradient.cis.upenn.edu
Fri, 16 Mar 2001 11:00:30 EST


> I'm not sure. The obvious reason for allowing empty list items is to
> allow people to start a list and fill things in later - possibly even
> easier to argue with descriptive lists::
> 
> 	fish -- eat them on Friday
> 	dogs -- don't eat them in western countries
> 	snakes --
> 	horses -- OK in France
> 
> To be honest, I'm not even sure what the current docutils code does. Not
> that that matters too much, since it would only need an RE tweaking.

Ok.  For now, STminus will say that empty list items are valid.

> Given I do support an empty '::' paragraph

Although that gets treated as a special case..  The empty paragraph
doesn't get preserved..

> >     * Apostrophes can appear in the middle of a word or at the end
> >       of a word, like "isn't" and "dogs'".  Is it illegal to have
> >       multiple apostrophes in the same word?
> 
> As in "*should* it be illegal"? Don't know - I would tend to think so
> pending proof otherwise, unless my current code doesn't care (!)

Yeah, all of my questions were "should" questions.

> >       There are no English
> >       words that use multiple apostrophes, but I'm not sure about
> >       other languages (although there are probably some languages
> >       that have words with apostrophes at the beginning of a word,
> >       ("'til"?) and StructuredText clearly won't deal with those..)
> 
> Oh dear - but English has "'phone" and various Yorkshire-style things
> which start with an apostrophe - sounds like another thing I need to
> check out.

I think we're just going to need to give up on word-initial apostrophes.
I don't see any way around it.  How else can you distinguish the
initial apostrophe in 'phone from the apostrophe in the literal 'phone'?  
Or even the word: 'phones' (posessive of 'phones)

> Trailing whitespace is removed right at the start. "structural"
> whitespace like that between a descriptive list "key" and the hyphens is
> lost. Whitespace comprising newline and indentation is conflated to a
> single space (except in literal paragraphs). How whitespace in literal
> strings is rendered (i.e., as   equivalents, or not) is *probably*
> left as an issue for the implementor of the renderer (I haven't decided
> yet, for STpy).

Whitespace in literals and inlines should be preserved.

> >     * Can #inline# expressions contain newlines?  I assume not
> >       ('literal' expressions can't.)
> 
> Oh yes they can, in docutils, and that's because I want them to, which
> means that in STpy they can as well (although it's obviously not
> documented yet). This is the sort of issue that only comes up when
> implementing (and I count your producing an EBNF as implementing, I
> guess, in this case).

Hm.  ick.  I don't like that.

> Reasoning - implementation first. When reassembling lines back into
> paragraph text, it is easy to either reinsert line breaks (and maybe
> indentation again) or just a single space. If you want to make RE
> handling easier, a single space wins big time. But that means you can't
> stop literal strings spanning newlines. Oh.

I can see ways around that.  But for now, I'll just say that I
think that what "makes more sense" in this case should trump what
"is easy to implement"... (also, if you want to be more compatible
with STNG, they don't allow newlines in literals.. :) )

> Philosophical second. OK - I hadn't though of that (goes I). But I *had*
> been being irritated by trying to use pseudo-STpy in my emails, 'cos
> (heh, another "'" at the start of a word!) I'm using Outlook <fx:spit>
> in which it is very hard to tell where lines will be broken, which means
> use of '..' is hard. And since an email variant of STpy is both easy to
> imagine and should be easy to do, this is a pity. But if the behaviour
> of *all* quoted things over linebreaks is well defined, and the same
> (and the implementation above is that natural usage) then the problem
> goes away.

There will always be problems using ST if you can't control your
own line breaks.  Otherwise, you'll get list items where you 
don't want them, etc..  

I like making literals single-line because:
    * it tends to keep them short, which is a good thing
    * we can handle whitespace in a more sensible way -- spaces are
      preserved, and are hard.  That way if someone wants to talk
      about the python string #'[  ]'#, they can.
    * apostrophes already seem dangerous to me, what with words like
      'cos that can accidentally start literals and words like 
      cats' that can accidentally end them.  If literals don't
      span multiple lines, then a parser has a much better chance
      of noticing that something's wrong.

> Incidentally, that is also why I'm not sure yet about whether spaces in
> string literals should be "hard" or "soft". I'm still thinking on it (I
> tend towards "hard", but worry about 'very long string literals which
> will not fit on a single line when being rendered and thus look really
> stupid going off the right hand margin').

I don't see why someone would ever really need a very long literal..
And if they don't mind it being broken up, they can split it up 
themselves..

> >     * What are valid expressions for starting an ordered list item?
[...]
> >       And it does not allow for expressions like:
> >
> >           1.2. This is a list item.
> 
> I thought it *did* allow an optional dot? Oh well, memory again. The
> requirement for a dot in STpy was specifically to stop that problem.

Used to.  Doesn't now.  Who knows if/when/how it'll change. :)

> That one shouldn't fly in STpy, because it
> would have to be::
> 
> 	i.t.e.m. (Since
> 
> because it's meant to be "one letter, or one or more digits"

Hm.  So no roman numerals in STpy?  ok.

> >       So maybe we could just use "([0-9]+\.)+"?
> 
> Personally, I wouldn't much mind if it were only letters and "arabic"
> ("indian"?) digits. The reason for having all three forms is that a
> rendered MIGHT use the form the user used to decide what form the
> rendering should use (and those three forms are common to all list
> formatters in common use). 

I don't think that people documenting modules will ever really care.
Also, it seems like this might get rendered differently by different
formatters, anyway.  I've been using LaTeX for a long time, without
ever feeling the need to tweak which ordered bullets it decides to
use..  I have a feeling that the same is true of most people..

> Given that's something people might care
> about, it makes sense (of course, I believe ST<other> implementations
> have tended NOT to make such use of the forms, but still).

I doubt any implementation ever will, either.. :)

> >     * What restrictions are there on hfrefs ("name"://http:some.url)
> >       According to STNG, they can use relative URLs ("name":whatever).
> >       These end up being pretty tricky to formalize..
> >
> >         * Can href names span multiple lines?
> >         * Can href names contain coloring? (I'd like to say no)
> >         * Should the string '":' only be allowed for hrefs?
> >           Or maybe '":(?!\s)', so you can say "this": that?
> >         * What do you do with things like::
> >
> >             This *is "too* confusing":http://some.url
> >
> >           (Keeping in mind that things like this should be ok)::
> >
> >             Normally *quotes " don't have* any special meaning,"
> >             so they don't have to nest properly..
> 
> Hah - URLs (URIs?) are impossible to do right in ST (of whatever form).
> There's a reference in TextRE.py to a page that describes the problems.

Hrm..  My questions were actually more concerned with the name part
than with the url part.  I assume that you're using the same basic 
markup here that STNG does (from reading STpy.html, it seems like
you are).  So which of the following are legal? ::

    "Here the *name* 'contains' markup":url

    "This name spans multiple
    lines":url

    "the following is not a url":<hi>

    Do *quotes "have to* nest" properly with coloring?

> 1. If it looks vaguely like a URL, expect it to be mistaken
>    for one (and "vaguely" will, of course, be defined)

I assume you mean this for when they just include an absolute url
in the text, like http://foo.bar .

> 2. URLs will not be allowed to span multiple lines, and if
>    they contain spaces (and maybe some other characters)
>    then those will need to be encoded (in the "normal"
>    manner)

Agreed.. 

> 3. Colouring does not occur in URLs.

Agreed.. Although how you decide where the url ends isn't obvious..

-Edward