[Doc-SIG] docstring grammar
David Ascher
da@ski.org
Sun, 28 Nov 1999 16:57:03 -0800 (Pacific Standard Time)
Proposed format for docstrings:
The whitespace at the beginning of a docstring is ignored.
Paragraphs are separated by one or more blank lines.
For compatibility with Guido, IDLE and Pythonwin (and increasing the
likelihood that the proposal will be accepted by GvR), the
docstrings of callables must follow the following convention
established in Python's builtins:
>>> print len.__doc__
len(object) -> integer
Return the number of items of a sequence or mapping.
In other words, the first paragraph must fit on a line, repeat the
name of the callable, with a 'wordy' signature, the ' -> ' string,
and the type of the return value. The second paragraph must be a
one-sentence description of the callable. It is also allowed to
have those two bits separated by a " -- " string:
>>> print [].pop.__doc__
L.pop([index]) -> item -- remove and return item at index (default last)
and functions which don't return anything can omit the " -> foo"
bit:
L.append(object) -- append object to end
Each paragraph is either 'text' or a 'keyword-tagged block'.
A keyword is a case-sensitive element of [a-zA-Z_]+ followed by two
colons (with optional whitespace between the keyword and the colons,
but no whitespace allowed between the two colons).
A paragraph which doesn't start with a keyword is 'text'.
Characters between # signs and the end of the line are stripped by
the docstring parser.
A 'keyword-tagged block' is nested much like Python code. Just like
in Python, the block can either be on the same line as the keyword
if it is one-line long (I'll refer to such blocks as 'text' blocks
even though they aren't in visual paragraphs), or needs to be
indented relative to the keyword.
Examples:
Author:: Guido van Rossum # comments are stripped
Date_of_release :: 1/1/1999 # The key is "Date_of_release" and the
# whitespace before the : is stripped
Contributors:: # The value is a block of lines.
John Doe
Ronald Reagan
Francois Mitterand
Some keywords can have special parsing rules, as the block of text
which the keyword designates is well-specified by the rules above.
The first example of such a keyword-specific parsing rule is for
Arguments:
Arguments::
self -- instance
input (sequence) -- the sequence which is being processed
(the specific syntax of Arguments:: is left for a later discussion).
Other candidates which can impose specific parsing rules are:
ReturnType, Date, Version, etc.
Text blocks can be followed by indented blocks as well -- those are
'children' blocks of the outdented block.
'text' blocks which start with * or - are tagged as 'bullet items'
for rendering. The bullet marker has to be consistent within a
given level of indentation.
Example:
* this is one bullet
- this is a sub-bullet
- this is another sub-bullet
* this is another bullet
In text blocks, some strings are recognized as links:
.foo in the docstring of a class will refer to the foo attribute
of the class. In the docstring of a method, it will refer to the
foo attribute of the method's class. In the docstring of a
module it will refer to a function or class defined in that
module
foo.bar will refer to the bar attribute of foo, which will be
looked up in the following namespaces in order: (to be determined)
URL notation is automatically recognized.
[foo] refers to the keyword 'foo' in the section 'References' of
the current docstring. [..] links cannot span multiple lines or
contain whitespaces (as keywords can't). (in other words, if a
[ is not matched by a ] in the same line or before a whitespace
character is hit, then it is a syntax error.
References::
foo:: My Dissertation, University Press, 1902
The set of keywords which are 'officially sanctioned' is:
For module docstrings:
[see Trove discussion for a good starting set -- this discussion
has been had!]
For class docstrings:
[To be determined]
For method docstrings:
[To be determined]
For function docstrings:
[To be determined]
Miscellaneous Thoughts:
I chose double-colon notation for keywords so that one can have text
paragraphs which match the 'word:' notation without having them be
interpreted as keywords.
Does this proposal make docstrings whitespace-heavy -- the
requirement to break each paragraph with a line of whitespace
means that a lot of lines are blank, especially when doing
'bulleted lists'
The above was (quickly) written with parsing in mind. Is it really
easily parseable? If not, what needs to be changed so that it is
parseable?
I also wanted to make sure that syntax errors could be flagged early and
'localized' for aid in debugging. I'm not sure that I did that
carefully enough.
Are there normal uses in docstrings where one wants to turn off the
automatic link detection?
Is there value in having string interpolation? David Arnold mentioned
__version__ = "$Revision$[11:-2]
__date__ = "$Date$
which raises some issues. I don't think that having [11:-2]
evaluated by the docstring parser is a wise idea. However, I can
imagine that the module author could do:
__version__ = "$Revision$"[11:-2]
in the Python code, and then
Version:: %(__version__)s
in the docstring and that such a simple string interpolation
mechanism could have value. I'm not sure it's worth the
complication though. What dictionary would be used to do the
interpolation?
Hopefully constructively,
--david
PS: It goes without saying that while I railed against design by
committee, I am of course hopeful for feedback, for technical reasons
(dummy, you forgot special cases X, Y and Z!) and because I realize that a
standards proposal needs at least broad agreement if not consensus to be
effective in the long run. The sharper-eyed will note that I stacked the
deck in my favor in the above proposal by including what Guido does
naturally as valid in the proposed grammar.