[SciPy-user] Docstring standards for NumPy and SciPy

Tue Jan 16 13:52:53 EST 2007

[I sent this 5 days ago, but it's been held because I was not  
subscribed -- so I decided to just go ahead & subscribe and resend  
it.  Apologies if it ends up being a dup.]

I'm glad to hear that you're making a push towards using standardized  
markup in docstrings -- I think this is a worthy goal.  I wanted to  
respond to a few points that have come up, though.

First, I'd pretty strongly recommend against inventing your own  
markup language.  It increases the barrier for contributions, makes  
life more difficult for tools, and takes up that much more brain  
space that could be devoted to better things.  Plus, it's  
surprisingly hard to do right, even if you're translating from your  
markup to an existing one -- there are just too many corner cases to  
consider.  I know Travis has reservations about the amount of 'line  
noise,' but believe me, there are good reasons why that 'line noise'  
is there, and the authors of ReST have done a *very* good job at  
keeping it to a minimum.

Given the expressive power that's needed for scipy docs, I would  
recommend using ReST.  Epytext is a much simpler markup language, and  
most likely won't be expressive enough.  (e.g., it has no support for  
tables.)

Whatever markup language you settle on, be sure to indicate it by  
setting module-level __docformat__ variables, as described in PEP  
258.  __docformat__ should be a string containing the name of the  
module's markup language. The name of the markup language may  
optionally be followed by a language code (such as en for English).  
Conventionally, the definition of the __docformat__ variable  
immediately follows the module's docstring.  E.g.:

   __docformat__ = 'restructuredtext'

Other standard values include 'plaintext' and 'epytext'.

As for extending ReST and/or epydoc to support any specializiations  
you want to make, I don't think it'll be that hard.  E.g., adding  
'input' and 'output' as aliases for 'parameters' and 'returns' is  
pretty simple.  And adding support for generating latex-math should  
be pretty straight-forward.  I think concerns about the markup for  
marking latex-math are perhaps exaggerated, given that the *contents*  
of latex-math expressions are quite likely to look like line-noise to  
the uninitiated. :)  I've patched my local version of docutils to  
support inline math with `x=12`:math: and block math with:

.. math:: F(x,y;w) = \langle w, \Phi(x,y) \rangle

And I've been pretty happy with how well it reads.  And for people  
who aren't latex gurus, it may be more obvious what's going on if  
they see :math:`..big latex expr..` than if they just see $..big  
latex expr..$.

If you really think that's too line-noise-like, then you could set  
the default role to be math, so `x=12` would render as math.  But  
then you'd need to explicitly mark crossreferences, so I doubt that  
would be a win overall.

[Alan Isaac]

> Must items (e.g., parameters) in a consolidated field be
> marked as interpreted text (with back ticks).
>     Yes.  It does seem redundant, so I will ask why.
>

I wouldn't mind changing this to work both with & without the  
backticks around parameter names.  At the time when I implemented it,  
I just checked what the standard practice within docutils for writing  
consolidated fields was, and wrote a parser for that.

[Alan Isaac]

> Would it not be nice to have :Inputs: and :Outputs:
> consolidated fields as synonyms for :Parameters:
> and :Returns:?
>     Yes!  Perhaps Ed Loper would be willing to add this.
>

The only concern might be if other projects have defined  
custom :input: and :output: fields that they use for other uses --  
I'll try to check if this is the case.  In the mean time, the  
following should do what you want:

from epydoc.docstringparser import *
register_field_handler(process_return_field, 'output')

from epydoc.markup import restructuredtext as epytext_rst
epytext_rst.CONSOLIDATED_FIELDS['input'] = 'param'
epytext_rst.CONSOLIDATED_DEFLIST_FIELDS.append('input')

[Alan Isaac]

> Is Epydoc easily customizable?
>     In what ways?  It is easy to add new fields
>     (see above), but I do not know about new
>     consolidated fields.
>

I intend for epydoc to be easily customizable, but at the moment it's  
only customizable in those places where I've thought to make it  
customizable.  If you find there's some customization you'd like to  
do, but there's no hook for it, let me know & I can try to think  
about what kind of hook would be appropriate.

[Alan Isaac]

> Is table support adequate in reST?
>

See <http://docutils.sourceforge.net/docs/ref/rst/ 
restructuredtext.html#tables>

If ReST table support isn't expressive enough for you, then you must  
be using some pretty complex tables. :)

[Alan Isaac]

>     math, so we could inline `f(x)=x^2` rather than
>     :latex-math:`f(x)=x^2`.
>

As I noted above, this would mean you'd have to explicitly mark  
crossreferences to python objects with some tag -- rst can't read  
your mind to know whether `foo` refers to a math expression or a  
variable.

> It may be worth asking whether
>     epydoc developers would be willing to pass $f(x)=x^2$
>     as latex-math.
>

Overall, I'm reluctant to make changes to the markup language(s)  
themselves that aren't supported by the markup language's own  
extension facilities.

> Why use underlining to define sections?
>     So that they are really sections.
>     The indented examples will display fine
>     but will not give access to sectioning controls.
>

If you don't use underlining, you'll get definition lists instead of  
sections.  It would be possible to register a transformation w/ ReST  
that checks for top-level definition lists & transforms them to  
sections, but I doubt it's worth it.  In my experience, the only time  
when you need to add section headings within a docstring is if the  
docstring is quite long, and in that case the underlining doesn't  
bother me too much.

[Gary Ruben]

> Currently epydoc generates far too much
> information (2371 pages worth when I ran it on the numpy source a few
> days ago) and seems unable to be easily modified to reduce its output.
>

If you can explicitly specify what you'd like included in the output,  
and how you'd like it formatted, then I can give you an idea of how  
hard that would be to produce.  You are right that, at the moment,  
epydoc's output generators are not terribly customizable.  And the  
latex output isn't as pretty as I'd like.  :)

[Gary Ruben]

> The other thing we want is to be able to generate examples from  
> heavily
> marked-up example modules a'la what FiPy does. I don't think epydoc  
> even
> allows that without modification.
>

For this, I highly recommend writing stand-alone doctest files, which  
can be run through docutils as-is to generate marked-up examples; and  
can be run through doctest to verify that all examples are correct.   
E.g., see:

   <http://epydoc.sourceforge.net/doctest/index.html>

Each of the files linked from that page is generated from a rst- 
formatted doctest file.

[Perry Greenfield]

> Any reason ipython can't use epydoc or some other tool to format the
> markup in ascii (I forget if epydoc does ascii output) so that the
> user doesn't see the 'line noise' when using the ipython
> introspection features?
>

If you add this to ipython, please be sure to check the __docformat__  
variable before deciding how to convert the docstring!  (If you  
encounter an unknown markup, then just render it as plaintext.)

As a final note, it's probably true that epydoc may currently be  
missing some of the hooks that you'd need to specialize ReST without  
doing some monkey-patching.  If you find that this is the case,  
please let me know what hooks you'd like to see added to epydoc.  Or  
if the construction you're trying to add is one that's likely to be  
useful to other epydoc users (e.g., latex-math), then it could  
certainly be added to epydoc itself.

-Edward

(disclaimer: I'm not subscribed to scipy-user; I just read the thread  
from the archives.  So please cc me on responses.)