[Doc-SIG] Comments on the reST specification

David Goodger dgoodger@bigfoot.com
Sun, 05 Aug 2001 13:30:18 -0400


on 2001-08-03 6:01 AM, Tony J Ibbs (Tibs) (tony@lsl.co.uk) wrote:
> An introduction to reStructuredText version 1.12 of 2001/07/10
> ==============================================================
> 
> intro:`Goals`_
> 
>     A tertiary useful goal, for many of us (well, for me at least) is
>     just to have *some* form of structured text, to read *as such*, for
>     use in docstrings, and for reading within docstrings.

I've amended the last sentence of the first paragraph of "Goals",
borrowing some of your words, to:

    The intended purpose of the reStructuredText markup is twofold:

    - the establishment of a set of standard conventions allowing the
      expression of structure within plaintext, and

    - the conversion of such documents into useful structured data
      formats.

> reStructuredText Markup Specification version 1.41 of 2001/07/20
> ================================================================
> 
> Hmm - to someone reading the raw text, it can be useful to know whether
> a link is to a target in the document or outside it. Of course, this can
> be done (and probably *should* be done, by a good author) by writing the
> text correctly

Yes, that's the ticket. Adding to the markup for this isn't worth it.

> spec:`Quick syntax overview`_
> 
>     In describing paragraphs, you use the term "flush left". To me this
>     means "flush against the left margin". Either "left justified" or
>     (as you use elsewhere) "left aligned" would be better.

Fixed.

> spec:`Whitespace`_
> 
>     You don't mention what happens if there are multiple blank lines
>     between paragraphs, etc.

Amended.

>     As to spaces versus tabs - this sounds like it will lead to all of
>     the traditional discussions we get on the Python list about whether
>     one can mix spaces/tabs, and whether a tab is *really* 8 or 4
>     spaces, leading finally to space eating nanoviruses (ick). I'd
>     suggest that this is dangerous territory, and that the behaviour
>     of tabs should be carefully undefined.

Or precisely defined, which is how it is now. Any specific
suggestions for improvements?

> spec:`Section structure`_
> 
>     You don't say which characters may be used as "underlines" for
>     section titles.

The spec says:

    An underline/overline is a line of non-alphanumeric characters
    that begins in column 1 and extends at least as far as the right
    edge of the title text.

So any of the RE character set '[!-/:-@[-`{-~]' is valid. Should I
spell them all out? Restrict them further? (I know, some of them
wouldn't be appropriate, but that's an aesthetic decision.)

>     Further down you say "nor must any specific section title style be
>     used". I think that "must" would be better as "need".

Done.

> spec:`Bullet lists`_
> 
>     The first incorrectly formatted example is also indented
>     incorrectly - make it say so!

OK!

> spec:`Enumerated lists`_
> 
>     The syntax diagram is missing.

Added.

> spec:`Tables`_
> 
>     In the example, in the cell that contains lines starting ``-``, does
>     that content count as a list? If so, the cell boundaries (top and
>     bottom) must be acting as the delimiting blank lines - say so.

Yes. Said.

> spec:`Footnotes`_
> 
>     "case insensitive single words" is unclear to me - it gets explained
>     later on, so maybe that explanation could be shifted or copied?

It gets explained right then & there, in parentheses:

    Footnote labels are case-insensitive single words (alphanumerics
    plus hyphens, underscores, and periods; no whitespace).

Is this insufficient? Which later explanation are you referring to?

> spec:`Hyperlink targets`_
> 
>     "whitespace neutral" doesn't mean anything to me.
> 
>     When you say they may contain whitespace, clarify if you include
>     newlines (some people include this as whitespace, some not). Yes, I
>     know the examples later on make it clear, but it would be nice to be
>     clear up front in the text.

One problem with writing a spec is that you need to write everything
everywhere, and it becomes a chore to write and a bore to read. That's
why we have "see this" and "refer to that".

I have extracted and expanded the relevant material into a new
section, "Hyperlink Names", and added references to other sections.

>     Is it allowed to use *redundant* backquotes - for instance::
> 
>         .. _`a b`:
>         .. _`ab`:

Yes. Made explicit.

> spec:`Comments`_
> 
>     Arbitrary text after a ``.. `` is treated as a comment.
> 
>     I don't think this is right.

I've been thinking about this also, with regards to the
incompatibility of comments (as they are now) and subsequent block
quotes: a block quote would be "swallowed up" by the comment. I'd
thought of making comments one-liners, as ``#`` is in Python.

>     Initially, I didn't see *any* point to having comments, but you'll
>     see I've used them occasionally in these notes. But I think the
>     way to introduce a comment is with a "proper" named directive::
> 
>         .. comment:: this is a comment, and
>            it continues in the normal manner
>            for a directive.

This seems right to me too. Let's take it a bit further: we'll limit a
comment to a single text block (i.e., up to the next blank line). A
"multi-block comment" could use "comment-start" and "comment-end"
directives. This would remove the indentation-incompatibility.

>     I believe strongly that a directive that does not start in a
>     legitimate manner should be treated as a "warning" (type 1 error?)
>     and that an output processor should (by default) not place its text
>     into the output, although it should allow the option of presenting
>     it in some delimited manner (perhaps in a different colour).

Putting the unimplemented directive's block into a literal block
inside a level-1 (or 2) system warning should do the trick.

>     Part of this is just that I think a comment (if used) should be
>     explicitly identified as such, but mostly it's for compatibility
>     with Stretch

What's "Stretch"? Link?

>     I think we get a more robust format if:
> 
>     1. Comments are a directive like any other::
> 
>           .. comment:: some text
> 
>        Heh - it even says what it is!

Yes, explicit. Which brings the syntax full-circle, since the
dot-dot-space syntax is now called "explicit markup".

>     2. Anything that is not a directive or a hyperlink target or a
>        footnote (I forgot those above - sorry) is of undefined
>        behaviour, and will generate a warning.
> 
>        This allows for extension in all sorts of ways in the future,
>        with minimal restriction.

I'm just about convinced. I'll ponder it some more. Any further
arguments or counterarguments anyone?

>     Incidentally, this would also mean that directives can use a single
>     colon as a delimiter - I think this would be easier to remember

Perhaps.

> spec:`Inline markup`_
> 
>     Item 1 says that inline markup start strings must be immediately
>     preceded by whitespace and zero or more of various characters. This
>     precludes having a start string at the start of a file, and maybe
>     even at the start of a paragraph (depending on how one views the
>     whitespace (blank line) that indicates the start of the second and
>     successive paragraphs).

I've amended the definition to explicitly allow start and end cases.

>     Item 5 mentions '<' and '>', but items 1 and 4 do not. I
>     assume this is a mistake.

Correct; already corrected.

> spec:`Interpreted text`_
> 
>     I'm not too keen on having the "role" inside the string (as you
>     might guess from my attempt at "namespaces" in these notes). I
>     assume that the compelling reason, for you, is a wish to allow
>     whitespace in the role name (whereas I'd overgeneralise RFC 822,
>     or perhaps XML names, or Python identifiers, or something).
> 
>     Personally, I think::
> 
>         role:`interpreted text`_
> 
>     to::
> 
>         `role: interpreted text`_

(Note: you don't need trailing underscores here. That turns them into
hyperlinks.)

>     because I think the former is easier for *me* to parse (e.g.,
>     specifically as something like class:`Fred`_ - the quotes go around
>     the *name* which feels right).

Definitely food for thought. I'll mull this over; opinions welcome.

>     By the way, it is not acceptable to leave out a description of how
>     to determine the difference between (for instance)::
> 
>         `role: Fred`_
>         `role: Fred`_
> 
>     in some arbitrary application where the former is identifying a
>     role, but the latter is identifying something *called*
>     (legitimately) "role: Fred" - and whilst I can't come up with a
>     concrete example now, this disturbs me greatly (and is perhaps the
>     main reason I prefer to move the role name outwith the string).

For the second, do you mean this? ::

    `role\: Fred`

I agree that placing the role outside of the backquotes would
alleviate this problem, but the tradeoff is that roles become single
words. Maybe that's not so bad though.

> spec:`Standalone hyperlinks`_
> 
>     You need to explain the explicit rules used to detect such an
>     animal, otherwise this is too vague.

I refer to RFC2396; I don't want to repeat it all. I've added this to
"Standalone Hyperlinks":

    Two forms of URI are recognized:
    
    - absolute URIs beginning with a scheme ('http:', 'ftp:', 'mailto:',
      'telnet:', etc.), and

    - standalone email addresses ('user@host').

    Standalone email addresses are treated as if they had a 'mailto:'
    prefix.

> Miscellaneous notes
> -------------------
> I understand (now) that the two-dots-and-a-space-after-newline convention is
> taken from setext, but why does it need to be two dots *and a space* - given
it
> must occur after a newline (and optional whitespace, of course), is there a
> compelling reason for the space, or is it just for
> compatibility? Would it be better without the space?

Ellipses ('...') are common in text, and possible at the beginning of
a paragraph. Two dots is not nearly so common. I'd venture to say that
its use in text is a mistake. I seem to recall that Pascal used it for
ranges, but that wouldn't adversely affect us here.

> Python extensions version 1.19 of 2001/07/20
> ============================================
> 
> py:`Option lists`_
> 
>     I don't think these belong here.

I've been considering merging the Python extensions into the main
spec, and merging the DTDs. I'd like to keep them separate for now to
allow for non-Docstring contexts. But option lists are not only for
Python, so maybe I'll move them over anyhow.

>     These are not *Python* options - these are Unix shell command
>     options (with common GNU extensions). They can clearly be dealt with
>     by other means, and should be so.

By "by other means", do you mean such as the example you gave, below?
I think that option lists are common enough (some form is in most
every command-line program I've ever written), and I'd like to keep
them. Unless there's something *problematic* about parsing them, I
don't see the problem having them available. Of course, anyone who
wants to use alternatives (tables, bullet lists, definition lists), is
free to do so.

>     Alternatively, try defining a new syntax for "tabbed tables"
>     (something like the facilities available in LaTeX) - something
>     that looks like::
> 
>         .. -a        :: Output all
>         .. -b        :: Output both (this description
>                       is quite long)

Excuse me, but, ugh.

> py:`Interpreted text`_
> 
>     As said elsewhere, use of `..` (undecorated with "_") to indicate
>     Python "elements" is a neat idea (if one doesn't want to use
>     '#..#').
> 
>     I agree that the use of roles to disambiguate this is helpful, *but*
>     I've already said elsewhere my problems with the specific way of
>     doing it.
> 
>     I can't help feeling we shouldn't need as *many* roles as you
>     enumerate - how is a poor user to keep track, especially of the ones
>     that aren't part of normal Python terminology (like "instance
>     attribute").

The way I envisage it happening is that, in Python docstring mode, the
parser has access to the namespace of the object being documented, and
can determine the correct interpretation through namespace lookups.
(No, I haven't implemented it yet, and yes, I know it ain't gonna be
easy.)

Isn't "instance attribute" part of Python terminology? That's what *I*
call them. Variables set in the class definition are class attributes,
those set in methods are instance attributes, etc. Is there a more
common set of names?

>     "Class attribute" (for instance) is too long, and "classatt" too
>     horrible - if we're producing these, it means we need to think
>     longer about how to specify Pythonic roles.

Suggestions welcome.

>     What is a "warning" or "warning class"?

New to Python 2.1, they're like exceptions that don't terminate the
program.

>     The "argument" role is missing.

No, it's omitted on purpose. "Parameters" are the names in
function/method definitions. "Arguments" are like local variables, not
requiring API documentation.

>     Specific recommendation: use "name" instead of "variable" (what's
>     the other recommendation that goes with that?).

Maybe. But "name" is such an overloaded concept. The Python RefMan
refers to them as "identifiers (names)".

(As to your question: I dunno, what?)

-- 
David Goodger    dgoodger@bigfoot.com    Open-source projects:
 - Python Docstring Processing System: http://docstring.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net
 - The Go Tools Project: http://gotools.sourceforge.net