[Doc-SIG] Automated doc string processing systems

Mon, 5 Mar 2001 10:30:06 -0000

Edward D. Loper wrote:
> I've recently been reading up on the status of automated documentation
> extraction from formatted inline comments in Python.  It sounds like
> one day, a tool for extraction may become part of the
> standard library..

As Ka-Ping Yee says elsewhere, already a done deal.

> (PEP 224).  I'm trying to find out more about what the current status
> of this is, who is actively working on such tools, and how I can
> help.  I'm pretty new to the area, but as far as I can tell there are
> 3 such tools currently in active development:
>   * Happydoc
>   * Pydoc
>   * Docutils
>
> Are all of these actually being actively developed?  Is one
> more likely
> than the others to become the standard?  I'd like to help work on this
> project, but I'm not sure who I should be contacting, and I'm pretty
> new to the area.

Well, all of the three are doing different things.

Ka-Ping Yee's pydoc will be in the standard distribution. It provides a
command line utility (similar to Unix man in functionality, but aimed at
Python modules). It provides an interactive "help" command, which can be
used from the Python prompt. And it provides the ability to generate
HTML pages. It doesn't yet address the *formatting* of the insides of
docstrings.

Doug Hellman's HappyDoc is an independent effort. It does not provide
the "interactive help" facility, and its aims are rather different in
philosophy - I think it is rather more ambitious in some ways about what
it wants to do. It already has an attempt at interpreting StructuredText
within doc strings. It wants to support *lots* of output formats. It
will look in comments as well as in docstrings. Whilst it isn't in the
standard distribution, it has been going a while, and is being used by
various people.

docutils (by me) is an attempt to provide the interpretation of the
*inside* of a docstring. It results from many discussions over the years
on the Doc-SIG about what those insides should look like, and is
basically an evolved form of StructuredText (similar to and hopefully
compatible with StructuredTextNG, which is, maybe, being developed by
the Zope people). Although it has a simple command line interface, and
can produce HTML, its main aim is to be used by utilities like the other
two. Oh, and it's not finished yet.

There have been other players, as well - pythondoc, Marc Lemburg's
doc.py, crystal, for instance - not all of whom have necessarily
abandoned work just because of what we are doing. The innards of Zope
use StructuredText, and they parse documentation strings as well.

Personally, I don't see a problem with having multiple tools. It is
*essential* to have a tool in the standard distribution, and pydoc is
just perfect for that purpose (it addresses so many of the needs all at
once). It is slightly less essential, but still important, to have a
common definition of how one *writes* (formats) a docstring, and some of
the documentation for that has been written. Leveraging off something
many people already (more or less) use is important here. My experiences
of writing docutils will lead to a firmer explication of that, plus a
tool that *works* with the explication. I'll be extremely happy if a
second implementation of *that* appears, as well.

> Also, I wrote a short essay trying to list many of the issues that
> such doc tools must deal with in one place, and discuss those issues.
> I'd appreciate it if you could give me any feedback on it.  It's
> available at:
>     http://www.cis.upenn.edu/~edloper/pythondoc.html

It's Monday, I've just gotten into work, and I haven't had time to read
your document properly yet. It looks at a brief scan as if it is working
at a slight tangent to the ST initiative - not necessarily a bad thing.
One of the things to bear in mind, though, is that experience shows that
MOST PEOPLE will not use "heavy" markup in docstrings - so HTML, XML,
TeX were right out - instead they want something easy to write, and easy
to read *without processing* - this is why ST was adopted (I was
initially a huge opponent of this, as I *like* markup, but I realised
that practice wins over theory). I think that the javadoc type of
information probably counts as "heavyweight" as well - it's certainly
not readable.

Hmm - looking at your example:

"""
  @cvariable(v) The v field of the class
  @type(v) int
  @ivariable(i) The i field of the instance.  Note that
     descriptions can continue onto the next line and can
     include *formatting*.
  @type(i) float
  @see(otherClass) that other class
  @author Edward Loper
  @author Another author
  @version $Id:$
"""

I suspect that one would instead write it something like:

"""
  Significant class values
    #v# - an integer, representing variability
    #i# - perhaps confusingly, a real value.
          (It's going to play merry hell if I ever want
          to search for this, with such a short name.)

  Also see ^#other_class# for similar ideas.

  Authors:
     * Edward Loper
     * Someone J. Else

  Version:
     <a silly version id>
"""

(there are a couple of things in there that won't work in current STpy,
at least one of which may be contentious, but the main point is that it
is much more like a normal text than a piece of form-filling, which
encourages explanation.)

Sorry for being brief - I hope it doesn't come across as impolite. David
Goodger also had a swathe of significant comments on the innards of
docstrings, some while back, which one day I want to go through and
comment on and maybe nick ideas from. I shall hang on to your document
as well.

(Hmm - that sounds a bit like I think *I'm* deciding how STpy works. I
admit it! <fx:fiendish maniacal laughter>. Well, actually, when the
first implementation gets a bit firmer, I'm hoping to prod people into
commenting on some of the subtler minutiae of the way forwards - of
course, if noone cares, *then* I get to rule the world... <fx:as above>)

As to contributed work - playing with docutils as it advances, and more
importantly comments on the "specification" at

	http://www.tibsnjoan.co.uk/docutils/STpy.html

would be useful - unfortunately, the document is a little light on *why*
features are as they are.

Tibs

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
Give a pedant an inch and they'll take 25.4mm
(once they've established you're talking a post-1959 inch, of course)
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)

[Doc-SIG] Automated __doc__ string processing systems

[Doc-SIG] Automated doc string processing systems