[Doc-SIG] docstring grammar

David Ascher da@ski.org
Sun, 28 Nov 1999 16:57:03 -0800 (Pacific Standard Time)


Proposed format for docstrings:

  The whitespace at the beginning of a docstring is ignored.

  Paragraphs are separated by one or more blank lines.

  For compatibility with Guido, IDLE and Pythonwin (and increasing the
  likelihood that the proposal will be accepted by GvR), the
  docstrings of callables must follow the following convention
  established in Python's builtins:

       >>> print len.__doc__
       len(object) -> integer

       Return the number of items of a sequence or mapping.

    In other words, the first paragraph must fit on a line, repeat the
    name of the callable, with a 'wordy' signature, the ' -> ' string,
    and the type of the return value.  The second paragraph must be a
    one-sentence description of the callable.  It is also allowed to
    have those two bits separated by a " -- " string:

      >>> print [].pop.__doc__
      L.pop([index]) -> item -- remove and return item at index (default last)

    and functions which don't return anything can omit the " -> foo"
    bit:

      L.append(object) -- append object to end

  Each paragraph is either 'text' or a 'keyword-tagged block'.  

  A keyword is a case-sensitive element of [a-zA-Z_]+ followed by two 
  colons (with optional whitespace between the keyword and the colons,
  but no whitespace allowed between the two colons).

  A paragraph which doesn't start with a keyword is 'text'.  

  Characters between # signs and the end of the line are stripped by
  the docstring parser.

  A 'keyword-tagged block' is nested much like Python code.  Just like
  in Python, the block can either be on the same line as the keyword
  if it is one-line long (I'll refer to such blocks as 'text' blocks
  even though they aren't in visual paragraphs), or needs to be
  indented relative to the keyword.

    Examples:

      Author:: Guido van Rossum   # comments are stripped

      Date_of_release :: 1/1/1999  # The key is "Date_of_release" and the
                                   # whitespace before the : is stripped

      Contributors::               # The value is a block of lines.

          John Doe

          Ronald Reagan

          Francois Mitterand
 
    Some keywords can have special parsing rules, as the block of text
    which the keyword designates is well-specified by the rules above.
    The first example of such a keyword-specific parsing rule is for
    Arguments:

      Arguments::
   
        self -- instance
        input (sequence) -- the sequence which is being processed

     (the specific syntax of Arguments:: is left for a later discussion).

     Other candidates which can impose specific parsing rules are:
     ReturnType, Date, Version, etc.

  Text blocks can be followed by indented blocks as well -- those are
  'children' blocks of the outdented block.

  'text' blocks which start with * or - are tagged as 'bullet items'
  for rendering.  The bullet marker has to be consistent within a
  given level of indentation.

    Example:

       * this is one bullet
  
          - this is a sub-bullet

          - this is another sub-bullet

       * this is another bullet

  In text blocks, some strings are recognized as links:

     .foo in the docstring of a class will refer to the foo attribute
     of the class.  In the docstring of a method, it will refer to the
     foo attribute of the method's class.  In the docstring of a
     module it will refer to a function or class defined in that
     module

     foo.bar will refer to the bar attribute of foo, which will be
     looked up in the following namespaces in order: (to be determined)
  
     URL notation is automatically recognized.

     [foo] refers to the keyword 'foo' in the section 'References' of
     the current docstring.  [..] links cannot span multiple lines or
     contain whitespaces (as keywords can't). (in other words, if a
     [ is not matched by a ] in the same line or before a whitespace
     character is hit, then it is a syntax error.

     References::

       foo:: My Dissertation, University Press, 1902

  The set of keywords which are 'officially sanctioned' is:

    For module docstrings:

      [see Trove discussion for a good starting set -- this discussion
      has been had!]

    For class docstrings:

      [To be determined]

    For method docstrings:

      [To be determined]

    For function docstrings:

      [To be determined]

 
Miscellaneous Thoughts:

  I chose double-colon notation for keywords so that one can have text
  paragraphs which match the 'word:' notation without having them be
  interpreted as keywords.

  Does this proposal make docstrings whitespace-heavy -- the
  requirement to break each paragraph with a line of whitespace
  means that a lot of lines are blank, especially when doing
  'bulleted lists'

  The above was (quickly) written with parsing in mind.  Is it really
  easily parseable?  If not, what needs to be changed so that it is
  parseable?

  I also wanted to make sure that syntax errors could be flagged early and
  'localized' for aid in debugging.  I'm not sure that I did that
  carefully enough.

  Are there normal uses in docstrings where one wants to turn off the
  automatic link detection?

  Is there value in having string interpolation?  David Arnold mentioned

       __version__ = "$Revision$[11:-2]
       __date__ = "$Date$

    which raises some issues.  I don't think that having [11:-2]
    evaluated by the docstring parser is a wise idea.  However, I can
    imagine that the module author could do:

       __version__ = "$Revision$"[11:-2]

    in the Python code, and then

       Version:: %(__version__)s
 
    in the docstring and that such a simple string interpolation
    mechanism could have value.  I'm not sure it's worth the
    complication though.  What dictionary would be used to do the
    interpolation?

Hopefully constructively, 

--david

PS: It goes without saying that while I railed against design by
committee, I am of course hopeful for feedback, for technical reasons
(dummy, you forgot special cases X, Y and Z!) and because I realize that a
standards proposal needs at least broad agreement if not consensus to be
effective in the long run.  The sharper-eyed will note that I stacked the
deck in my favor in the above proposal by including what Guido does
naturally as valid in the proposed grammar.