[Doc-SIG] formalizing StructuredText

Tony J Ibbs (Tibs) tony@lsl.co.uk
Fri, 16 Mar 2001 10:50:39 -0000


Edward D. Loper wrote:
>     * Are list items required to have contents?  I.e., can a list
>       item be just a bullet?  This only makes sense to me if you
>       used it in an environment like::
>
>           1.
>
>                text...

I'm not sure. The obvious reason for allowing empty list items is to
allow people to start a list and fill things in later - possibly even
easier to argue with descriptive lists::

	fish -- eat them on Friday
	dogs -- don't eat them in western countries
	snakes --
	horses -- OK in France

To be honest, I'm not even sure what the current docutils code does. Not
that that matters too much, since it would only need an RE tweaking.

Given I do support an empty '::' paragraph (although for, erm, odd
reasons) and I do tend to write incomplete lists when working on
documentation, I think I'm convincing myself.

(Mind you, I don't like your specific examples, 'cos they're laid out in
a way I don't like - but your examples become OK if my examples get
allowed, so...)

>     * Apostrophes can appear in the middle of a word or at the end
>       of a word, like "isn't" and "dogs'".  Is it illegal to have
>       multiple apostrophes in the same word?

As in "*should* it be illegal"? Don't know - I would tend to think so
pending proof otherwise, unless my current code doesn't care (!)

>       There are no English
>       words that use multiple apostrophes, but I'm not sure about
>       other languages (although there are probably some languages
>       that have words with apostrophes at the beginning of a word,
>       ("'til"?) and StructuredText clearly won't deal with those..)

Oh dear - but English has "'phone" and various Yorkshire-style things
which start with an apostrophe - sounds like another thing I need to
check out.

>     * When parsing various structures, like paragraphs and list
>       items and bold items, what whitespace is kept?  E.g., if I
>       were to export to XML, would the trailing whitespace on
>       paragraphs be included?  Or the whitespace between a
>       description list key and the hyphen?

Trailing whitespace is removed right at the start. "structural"
whitespace like that between a descriptive list "key" and the hyphens is
lost. Whitespace comprising newline and indentation is conflated to a
single space (except in literal paragraphs). How whitespace in literal
strings is rendered (i.e., as   equivalents, or not) is *probably*
left as an issue for the implementor of the renderer (I haven't decided
yet, for STpy).

>     * Can #inline# expressions contain newlines?  I assume not
>       ('literal' expressions can't.)

Oh yes they can, in docutils, and that's because I want them to, which
means that in STpy they can as well (although it's obviously not
documented yet). This is the sort of issue that only comes up when
implementing (and I count your producing an EBNF as implementing, I
guess, in this case).

Reasoning - implementation first. When reassembling lines back into
paragraph text, it is easy to either reinsert line breaks (and maybe
indentation again) or just a single space. If you want to make RE
handling easier, a single space wins big time. But that means you can't
stop literal strings spanning newlines. Oh.

Philosophical second. OK - I hadn't though of that (goes I). But I *had*
been being irritated by trying to use pseudo-STpy in my emails, 'cos
(heh, another "'" at the start of a word!) I'm using Outlook <fx:spit>
in which it is very hard to tell where lines will be broken, which means
use of '..' is hard. And since an email variant of STpy is both easy to
imagine and should be easy to do, this is a pity. But if the behaviour
of *all* quoted things over linebreaks is well defined, and the same
(and the implementation above is that natural usage) then the problem
goes away.

Incidentally, that is also why I'm not sure yet about whether spaces in
string literals should be "hard" or "soft". I'm still thinking on it (I
tend towards "hard", but worry about 'very long string literals which
will not fit on a single line when being rendered and thus look really
stupid going off the right hand margin').

>     * What are valid expressions for starting an ordered list item?
>       Currently STNG uses "([a-zA-Z]+\.)|([0-9]+\.)|([0-9]+\s+)"
>       i.e., a series of letters followed by a dot, a series of
>       numbers followed by a dot, or a number followed by space.
>       This seems wrong to me, because it implies that the following
>       are ordered list items::
>
>           Hi.  This is a list item.
>
>           12 is a fun number.
>
>       And it does not allow for expressions like:
>
>           1.2. This is a list item.

I thought it *did* allow an optional dot? Oh well, memory again. The
requirement for a dot in STpy was specifically to stop that problem.

>       Also, note that since in STpy variants (which will include
>       my proposed markup for formatted docstrings), list items can
>       begin without an intervening space.. So we would get::
>
>           The first line is a paragraph but the second line is a list
>           item.  (Since it starts with letters followed by a dot)

Erm - space -> blank line. That one shouldn't fly in STpy, because it
would have to be::

	i.t.e.m. (Since

because it's meant to be "one letter, or one or more digits"

>       Even if we restrict ourselves to Roman numerals, we have
>       problems::
>
>           Hopefully someone who can figure this out who is
>           smarter than
>           I.  But I don't see a way to use roman numerals safely..

Hmm - yersss.

My tack on that one is, unfortunately, that it is a case of "so don't do
that" - the basic problem with ST is that there are *some* things one
can't do, because it is striving for naturalness otherwise. But on the
other hand:

>       So maybe we could just use "([0-9]+\.)+"?

Personally, I wouldn't much mind if it were only letters and "arabic"
("indian"?) digits. The reason for having all three forms is that a
rendered MIGHT use the form the user used to decide what form the
rendering should use (and those three forms are common to all list
formatters in common use). Given that's something people might care
about, it makes sense (of course, I believe ST<other> implementations
have tended NOT to make such use of the forms, but still).

>     * What restrictions are there on hfrefs ("name"://http:some.url)
>       According to STNG, they can use relative URLs ("name":whatever).
>       These end up being pretty tricky to formalize..
>
>         * Can href names span multiple lines?
>         * Can href names contain coloring? (I'd like to say no)
>         * Should the string '":' only be allowed for hrefs?
>           Or maybe '":(?!\s)', so you can say "this": that?
>         * What do you do with things like::
>
>             This *is "too* confusing":http://some.url
>
>           (Keeping in mind that things like this should be ok)::
>
>             Normally *quotes " don't have* any special meaning,"
>             so they don't have to nest properly..

Hah - URLs (URIs?) are impossible to do right in ST (of whatever form).
There's a reference in TextRE.py to a page that describes the problems.

The rules I'm working towards are probably going to be something like:

1. If it looks vaguely like a URL, expect it to be mistaken
   for one (and "vaguely" will, of course, be defined)
2. URLs will not be allowed to span multiple lines, and if
   they contain spaces (and maybe some other characters)
   then those will need to be encoded (in the "normal"
   manner)
3. Colouring does not occur in URLs.

Confusing examples I'll leave until after the alpha release, I'm afraid
(but the general rule is probably "if it's confusing, don't be surprised
if it is, indeed, confusing").

Tibs

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"How fleeting are all human passions compared with the massive
continuity of ducks." - Dorothy L. Sayers, "Gaudy Night"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)