[Doc-SIG] docstring grammar

M.-A. Lemburg mal@lemburg.com
Wed, 01 Dec 1999 12:43:19 +0100


Edward Welbourne wrote:
> 
> > Since [] is only used for lists in Python, we could
> > define the RE '\[[a-zA-Z0-9_.]+\]' for our purposes and
> > raise an exception in case the enclosed reference cannot
> > be mapped to a symbol in the global namespace (note: no
> > whitespace, no commas) which either evaluates to a function,
> > method, module or reference object.
> 
> umm ... hang on, two things seem stirred up here.  The proposal I
> remember from ages ago and tried to echo has [token] and the token
> doesn't have to be intelligible to the python engine: elsewhere in the
> doc string, we'll have
> 
> References:
>    [token] reference text
> 
> which the parsed docstring uses to decode each use of [token] that
> appeared in the docstring.

Right, but we extended the lookup notion to what David summarized
in a recent post:

I believe that the namespaces looked up
should be:

  1) the local namespace of the docstring -- i.e., the set of keywords
     defined in the "References" keyword block in the current docstring.
  2) the global namespace of the docstrings -- i.e. the set of keywords 
     defined in the "References" keyword block in the MODULE docstring.
  3) The global Python namespace for that module
  4) Some namespace corresponding to builtins & unimported modules, yet
     ill-defined.

+ I would like to add:

     The looked up object will only be converted to a reference
     if it is either an object having a doc string, or a reference
     object (these are created through the Reference: section).
     In case this condition is not met, either a warning is
     issued or the [token] text is taken as is.

+ modify the RE to include hyphens:

     '\[[a-zA-Z0-9_.-]+\]'

Given the above, [None] would then either cause a warning or
be left in the doc string with no further magic applied.

Other uses of square brackets would have to include at least
one of the characters not allowed by the above RE, e.g. spaces.
This makes mixing [references] and [ code, examples ] very
simple and straight forward.

As always, the details of how to convert the reference to
markup should be left to a reference engine. We should focus
on tokenizing first and only then start thinking about
what to do with those tokens... e.g. automagically convert
them to HTML anchors or whatever.

AFAICT, we have these tokens and symbols:

Keyword:

  A Keyword is a case-sensitive string which:
      - starts a paragraph
      - matches  '^ *[a-zA-Z_]+[\-a-zA-Z_0-9]*: +' 
        (Python identifiers with the addition of hyphens and which end
        with a : and one or more spaces)

Keyword Block:

  A Keyword Block is a paragraph of text starting with
  a Keyword and followed by Single Line Text or a Text Block.

Reference:

  A Reference is a case-sensitive string which:
     - matches '\[[a-zA-Z0-9_.-]+\]'
  (lookup as indicated above is left to the reference engine to
   implement)

Single Line Text:

  Single Line Text is all remaining text on the current line.

Text Block:
  
  A Text Block is a paragraph of indented text.

Bullet Block:

  A Bullet Block is a paragraph of indented text using
  a bullet character as first non-whitespace character at the
  indention index.

First Line:
  
  A line of text matching <RE for "name(args,kws) -> returns -- does">

Blank Lines:

  One or more lines of whitespace text.

All Blocks may be nested (is this true?). Nesting is indicated by
indention level.


Anthing missing ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                    30 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/