[melbourne-pug] Docstring parsing and formatting

Tue Sep 18 04:53:17 CEST 2012

Hi Ben,

I haven't seen anything for PEP 257, but there is the numpy docstring formatting standard that is used to produce the numpy documentation in Sphinx.

The docstring (module, class, function) is written in RestructuredText, for example (numpy.arange.__doc__):

    """
    arange([start,] stop[, step,], dtype=None)

    Return evenly spaced values within a given interval.

    Values are generated within the half-open interval ``[start, stop)``
    (in other words, the interval including `start` but excluding `stop`).
    For integer arguments the function is equivalent to the Python built-in
    `range <http://docs.python.org/lib/built-in-funcs.html>`_ function,
    but returns a ndarray rather than a list.

    When using a non-integer step, such as 0.1, the results will often not
    be consistent.  It is better to use ``linspace`` for these cases.

    Parameters
    ----------
    start : number, optional
        Start of interval.  The interval includes this value.  The default
        start value is 0.
    stop : number
        End of interval.  The interval does not include this value, except
        in some cases where `step` is not an integer and floating point
        round-off affects the length of `out`.
    step : number, optional
        Spacing between values.  For any output `out`, this is the distance
        between two adjacent values, ``out[i+1] - out[i]``.  The default
        step size is 1.  If `step` is specified, `start` must also be given.
    dtype : dtype
        The type of the output array.  If `dtype` is not given, infer the data
        type from the other input arguments.

    Returns
    -------
    out : ndarray
        Array of evenly spaced values.

        For floating point arguments, the length of the result is
        ``ceil((stop - start)/step)``.  Because of floating point overflow,
        this rule may result in the last element of `out` being greater
        than `stop`.

    See Also
    --------
    linspace : Evenly spaced numbers with careful handling of endpoints.
    ogrid: Arrays of evenly spaced numbers in N-dimensions
    mgrid: Grid-shaped arrays of evenly spaced numbers in N-dimensions

    Examples
    --------
    >>> np.arange(3)
    array([0, 1, 2])
    >>> np.arange(3.0)
    array([ 0.,  1.,  2.])
    >>> np.arange(3,7)
    array([3, 4, 5, 6])
    >>> np.arange(3,7,2)
    array([3, 5])
    """

The format can then be rendered using Sphinx, generating appropriate formatting and links for the different elements (e.g. parameters, types, returns, see also, examples, etc), but it might be possible to use/extend the parser in the Sphinx extension module for more general purposes. I have used the format to generate documentation in non-numpy projects. It is nice, since it is quite readable in unrendered form, but is also parseable into documentation. It is somewhat verbose, however, not all sections in the docstring are mandatory (e.g. "See Also" and "Examples").

Format Standard:
    https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt

Sphinx Extension:
    http://pypi.python.org/pypi/numpydoc
    https://github.com/numpy/numpy/tree/master/doc/sphinxext

Cheers,
Josh

On 18/09/2012, at 12:08 PM, Ben Finney wrote:

> Howdy all,
> 
> Where can I find a standard implementation of the docstring parsing and
> splitting algorithm from PEP 257?
> 
> 
> PEP 257 describes a convention of structure and formatting for
> docstrings <URL: http://www.python.org/dev/peps/pep-0257/>. Docstrings
> that conform to this convention can therefore be parsed into their
> component parts, and re-formatted.
> 
> The PEP describes <URL: http://www.python.org/dev/peps/pep-0257/#id20>
> and algorithm for parsing the docstring as found in the string literal.
> It says “Docstring processing tools will …” and goes on to describe, in
> prose and example code, how the parsing should be done.
> 
> Where is a common implementation of that algorithm? It seems that it
> should be in the Python standard library, but I can't find it.
> 
> Ideally what I want is to be able to write:
> 
>    import textwrap
> 
>    (summary, description) = textwrap.pep257_parse(foo.__doc__)
> 
> and have ‘summary’ as the docstring's summary line, and ‘description’ as
> the docstring's description (as described in <URL:
> http://www.python.org/dev/peps/pep-0257/#id19>).
> 
> _______________________________________________
> melbourne-pug mailing list
> melbourne-pug at python.org
> http://mail.python.org/mailman/listinfo/melbourne-pug

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/melbourne-pug/attachments/20120918/b5456ab1/attachment-0001.html>