[Doc-SIG] Evolution of library documentation

Edward D. Loper edloper@gradient.cis.upenn.edu
Mon, 12 Mar 2001 15:38:03 EST


I think that for the most part, I tend to side with Tibbs on this..
>From my perpsective, docstrings should include clear, concise,
and unabiguous *definitions* of python entities.  In other words,
by reading a docstring, you should know *exactly* what a function/
method/class/etc. is guaranteed to do.  Note that they can leave
that underspecified.  For example, a function that is defined to
"return a list containing the prime numbers between 1 and n, 
exclusive" is free to order that list however it wants.  This helps
enormously when trying to read source code, because it lets you
know what parts of a function are there because the're part of 
the function's definition, and what parts are just an implementation
decision.  

Note that that is generally exactly the type of documentation you
want in a reference manual.  I think that the idea of building 
the reference manuals from docstrings makes a lot of sense.  I do
*not* think that including tutorials, howtos, big examples with 
explanations, etc. belong in the docstrings.

> (although I would point out it's
> *significantly* easier to "have TeX" than, for instance, to "have CVS",
> which one is also required to have to do development with modern Python
> (a *serious* problem for some of us).

This is pretty much completely a digression, but I figured I'd chime 
in.  Regardless of how easy it is to download CVS or LaTeX, it is
much easier to learn how to *use* CVS than to learn how to use
LaTeX.  Given a knowledgable teacher, you can learn everything you
need to know about CVS in an hour.  I'm not sure anyone ever learns
everything they need to know about LaTeX. :)  (This is coming from
many years of experience using both pieces of software -- and I
personally think that they're *both* great, and everyone should learn
them both. :) )

> > This would address the duplication problem and
> > also keep all of a module's documentation in one place together with
> > the module.
> 
> Now, if you said "package" I'd be happy, but since it's "module", I'll
> gripe.

The module's definition should be kept with the module (or enough
background explanation to allow one to define every class/function/
method/var in the module).  It *may* make sense to keep other types 
of documentation in the module (??) or the package, but it's less 
clear.  But if they are given a place, I don't think it should be in
docstrings.

> > To avoid forcing you to page through a huge docstring
> > before getting to the source code, we would allow a long docstring to
> > go at the end of the file (or maybe collect docstrings from anywhere
> > in the file).
> 
> Aagh! No, sorry, my problem wouldn't be with paging (although that *is*
> a problem - and why is the end of the file so different than the
> front? - I page from both ends, depending on context!).

In general, well-written definitions should be fairly short, so this
shouldn't be TOO much of a problem.  But one issue that I do remember
people having is that docstrings are kept at runtime (which I think
is great for what I'm saying they should do), and people are concerned
that they will eat up too many resources..  Is this really a problem
or am I just misremembering something?

> Source files are for source code. I want to be able to *treat* them as
> such. It is quite possible for a two page source to have ten or more
> pages of documentation associated with it. That does *not* belong in the
> same *file* as the source - if someone *wants* to associate them
> closely, the correct way to do it is with a *package*.

I think that the definitions, however big, should be kept with the
source code.  (Of course, if someone needs 10 pages to define the
behavior of 2 pages of source, something's wrong).  But I agree that
everything else should either be kept in package (where appropriate)
or at a higher level (for docs that span packages).

> Also, *because* one might have more than one sort of "grander scope"
> documentation for a module/package, you will have to consider
> *supporting* more than one. Difficult if it is "just" a string tacked on
> the end.

I think that we should leave the organization of "grander scope" 
documentation to a different project..  (Of course, it's still
an important project.)

> The reason for adopting ST (or some variant) for markup in docstrings
> is, basically, because it is acknowledged that many people will not
> create docstrings with more markup than that, or with more obtrusive
> markup than that.

I think it might make sense to reserve one character, or maybe 2, for
advanced markup.  (We would also want to be able to backquote it
somehow, but we'll leave discussion that for later..).  So, for example, 
we could say that '@' is reserved for advanced markup, and then it
can be used by people who:
  1. want more advanced features
  2. are willing to use a "real" markup, which is more "obtrusive"
     and difficult to read/write.

I think that we should limit how much more complex "basic" ST gets,
as much as possible..

> Worse, if one tries to continue using "simple" markup in ST, one is
> going to end up with strained analogies, and with almost any
> non-alphanumeric character having a special meaning. Yuck (can we say
> Perl?).

I will be very upset if ST takes that type of turn.

> The obvious way round that is to start doing, well, markup - for
> instance, '@class(..)'
> or somesuch (like Pod, I think? - or GNU texinfo). In which case we're
> inventing our own little markup language again, with none of the reasons
> for doing it that went into ST.

I agree that it doesn't make sense to make up our own markup langauge..
So maybe reserve (non-backslashed) '<' and '>' and use XML for
advanced markup?  

Of course, depending on what we decide we need, we may be able to get
away with much less.  For example, define things like::

  Parameters:
     p1 -- foo
     p2 -- bar

To have special meaning in the context of docstrings.  Any standard
(STpy-like) ST will just render them as a heading with a list.
You can also define special forms like::

  author -- Edward Loper
  version -- 2.71828

That takes care of most of the structural-type markup, and still looks
ok if you just read it, or parse it with non-docstring-specific ST
tools..

Then you just have to worry about syntax for inline markup.  I don't
have any great ideas there, other than having a whole class of
"advanced inline markup" tags like @somethingorother(...)

Anyway, I have a meeting to get to..  :)

-Edward