[Doc-SIG] docstring grammar

Edward Welbourne Edward Welbourne <eddyw@lsl.co.uk>
Tue, 30 Nov 1999 14:35:21 +0000


> Thus:
> * Any line starting with a word followed by a colon can be considered
> a keyword.  If you dont want this, just make sure its not the first
> word on the line.

Not happy.  A paragraph of text which precedes an example may be relied
upon to end in `for example:', in which the last contiguous block of
non-space characters is of length 8; if I modify an earlier part of the
paragraph, I'm going to ask my authoring tool (python-mode.el) to
reformat the paragraph, without necessarily being aware of a gotcha
waiting for me at the paragraph's end; my margins will be within 72
characters of one another, giving a roughly 1 in 9 chance that
`example:' ends up being alone on the last line ... gotcha.

A cure for this would just be to do keyword-recognition case
sensitively, and Capitalise keywords; otherwise, we have to insist on
either a dedent or a blank line preceding any keyword.  Which offends
folk worse: case sensitivity or needing a dedent/vspace ?


> * A star or dash starting a line can be considered a new list item.
> Again, if it is truly a hyphen or whatever else, just adjust your line
> wrap slightly so it is no longer the first word.

Alternatively, all lists use the same `item-introducer' character and
follow it with an optional character indicating what bullet to use.
Thus one might have (taking ~ as the introducer for the illustration)

  ~ outermost list, first item
  ~ outer second which may contain a subordinate
    ~ which is dedented so it can use the same introducer without
      confusion
    ~ and output formatters can chose different symbols
      in place of the star for successive nesting layers
    ~ by the way, should further lines line up with the text or the
      bullet ?  my reckoning is with the text ...
  ~ outer third, whose subordinate might want Roman numerals
    ~i so it indicates them thus
    ~i and can chose to leave the engine to sort out numbering
    ~iii but can effectively assert that one item (referred to
         elsewhere) has a particular number
    ~i without having to mention numbers for the rest
    ~i and of course 
       ~1 we can use the other numbering styles
       ~2 including alphabetic, upper or lower, using ~A or ~a.
       ~1 with use of first in series taken as `work out right number'
       ~7 but I think the tool should complain if you get later
          positions wrong: it's an assertion, and it indicates that this
          item is going to be referred to from other text as item 7 - I
          need to be told I got it wrong !  Obviously I've deleted a few
          items before this one without realising what's happening below ...
  ~ outer fourth
    ~o must the bullets in a given list all match ?
       ~. should stand for mid-dot, and star is likewise easy using *
    ~o I think so, anyway
      ~- dash is obvious and now unambiguous, as are + and =
    ~o mind you, o requires care: if it's the first item in a list, that
       list is going to use o as its bullet; but if it appears in a list
       which began with a ~a then we have to read it as item fifteen.
      ~ and if we're insisting on all items in a list having the same
        bullet, does it make sense to allow items after the first to
        just use an unadorned star meaning re-use of first item's
        symbol, thus saving us lots of editing when we want to change
        the symbol in use by a list, or shuffle an item from a sub-list
        out into its parent list (or etc.)
      ~ of course, ~ needn't be the bullet-introducer, we could use
        pretty much any punctuator as long as it doesn't obviously
        clash; candidate egs: #, @, $, %, &, * and even |
  ~ outer fifth
    ~ as for descriptive lists, I'd go with the old gendoc form, which

      uses double dash -- which just feels so natural, but

      needs vspace -- to separate items, given that -- might be used
      within an item on a later-than-first line.  I can live with this.

> Other random thoughts:
> * The [blah] notation is good, but needs to be well defined.  eg,
> "[module.function]" when used in the context of a package should use
> the same "module scoping" that Python itself uses.

The thing that saves [this] from being problematic is that the format in
which it was introduced presumed that one was going to use a brief
mnemonic as [this] word and end the docstring with a chunk which
explains the cross-references (new keyword: Xrefs ?) and, in particular,
tells the doc-string-reader which [tokens] actually have a translation,
the rest being left as typed; thus, if this paragraph appeared in a
docstring which says how to translate [this] (giving an xref and -
optionally - a text to use (default `this') in place of [this]), the
digested form would duly replace [this] but leave [tokens] as it is.

To further simplify life, I'd understood the [this] keys that are
translatable to insist on [nowhitespace] to save the parser most of its
`this might be an xref' pending decisions - which is why the Xrefs
section needs to at least have the option of specifying the text to be
used in place of [this] as well as the Xref to point it at.  What we're
doing is citation, which is widely done with [].

No need for [this] to be a [module.function] or anything like - the
Xrefs section provides the translation.

Xrefs:
   [gendoc] http://www.python.org/contrib/gendoc/
   [this] http://www.python.org/lists/doc-sig/hideous?with=data&as=you+will The present message
   [copy] string.copy the standard string copy function
   [etc] location sub sti tute

[sorry, all exhibited xrefs are bogus - illustrative only]
I'm sure that's only a minor paraphrase of a spec I saw a while ago on
this list ...

Of course, Xrefs might better be called Bibliography.

We can use as `location' some pythonic reference that can be resolved in
the ways that the suggested module.function semantics point to: indeed,
I would take this as what to try first, falling back on recognising
other stuff as URLs and similar.

> ... However, the use
> of brackets may conflict with people who use inline code (rather than
> an example "block" - maybe something like "@" could be used?
> @module.function@ would be reasonable.

With the above, can we evade this ?
The fact that [citations] are so widely used argues for the [form]; and
the fact that [anything with space in it] isn't a citation should make
all the `ordinary text' and `python denotations' [usages] unproblematic,
while leaving untranslated ones as [literal] uses of [ and ].  If
nothing else, I find my eye latches onto [cite] better than @cite@ ...
and bear in mind that @ has some other magic uses,

parser error - unclosed citation at line 137:
      Sender: eddyw@lsl.co.uk

All told, we seem to have a fairly good spec ... save for some
nitpickery ;^>

Tibs said:
> David (Ascher) - is it time to re-release your initial "docstring
> grammar"
and I confess that's something I'd like to see too.
After all, we have to have someone to play Gdo ...

	Eddy.