[Doc-SIG] Ease of use is #1

Ka-Ping Yee ping@lfw.org
Sun, 6 Feb 2000 14:19:31 -0800 (PST)


To the moderator: please cancel my earlier message, which had an
attached example that was too big.  Instead of including it in the
message, i'll just put it on a website.

------------------

Before i begin, let me make an attempt to propose the two most important
goals of any documentation project.

    I. To encourage people to write lots of documentation.

    II. To make that documentation as accessible as possible.

There is an example at http://www.lfw.org/python/SocketServer.html ;
it is discussed in more detail below.


On Fri, 4 Feb 2000, Moshe Zadka wrote:
> Special tokens:
> 	@ -- escapes any character following it (i.e., @c is always translated
> 	to c)
> 	
> 	[ -- a short tag opener. Closed with a matching ]
> 
> 	::(newline) -- beginning a a long tag. everything until the
> 	indent level returns to that of the line which started the long
> 	tag is part of the contents.
> 
> 	(newline)(newline) -- new paragraph.
> 
> The syntax of short tags is '[' tagname ' ' contents ']' where contents is
> any valid snippet of docstring, with the exception of a long tag.
> 
> The syntax of long tags is
> 
> tagname (attr '=' value)*'::'
> 	contents
[there follow some 450 lines of description]


My first reaction to this is --

    Holy crap!  This has gotten totally out of control!


It will do us no good for geeks like us -- the 1% of uber-geeks
*within* that 1% of geeks that Randy described -- to sit in ivory
towers describing elaborate syntaxes for marking up documentation
when most other readers won't even understand the syntax, let
alone use it in their own code.  Why, i wouldn't even bother to
do all of this marking up in my own documentation.

I sincerely wish not for Moshe to take this personally -- i just
think that this is one example of far too many steps down a seductive
path in the wrong direction.  Why does one never see e-mail like
the following?

    <fragment>
        <adj/Holy/ <expl severity=mild>crap</expl>!
    </fragment>
    <sentence>
        <subj><pron/This/</subj> <pred><verb>has gotten</verb>
        <adjp><adv>totally</adv> <adv/out/ <prep/of/ <noun/control/</adjp>!
    </sentence>

Because:

    1. No one in their right mind would waste all that time typing.

The markup has two potential audiences: humans and machines.

    2. The markup isn't going to get used by any sort of mechanical
       parsing tool anyways.

    3. Any meaning that a human could obtain from the markup can be
       fairly well derived from context.


Now, let's translate these three points over into the world of
source code documentation.


[1.]

Point 1 stands as is.  Every obstacle that we introduce -- whether
a cognitive obstacle (more syntax to understand) or just extra effort
(more typing) will make it less likely for people to write documentation.
Can you imagine how quickly people would simply stop writing any
documentation at *all* if we were to have doc-sig police jumping around
pointing out "Oh, your docstring has incorrect syntax because you didn't
escape this bracket with an @-sign here." or "You used that semantic tag
incorrectly; you should have used [var] instead of [code]."?

Remember the old adage that says that every math equation in a book
will cut its readership in half?  Well, imagine something similar --
every additional syntactic construct or tag will cut the *writership*
in half (or some fraction).  Even though more tags may mean more
expressive power, it also means a more difficult choice to make every
time one uses a tag -- beyond a certain point it becomes debatable
which tag is the correct one to use, and then all is lost.  Also, the
more complex the system becomes, the less predictable the failure modes
will be -- till we get to the point you have to debug docstrings (eek!).

The absolute priority here is to make docstrings *dead* easy to write.
That entry barrier has to go way down -- in the limit, to zero, where
even existing docstrings written without knowledge of our discussion
can be considered rich and "correct" docstrings.

(The example is generated from SocketServer.py as distributed with
Python 1.5 -- nothing has been edited.)


[2.]

To the proliferation of long and short tags such as arg, code, data,
etc. -- first i ask, what is the purpose?  Is this a solution in
search of a problem?  What will an automatic documentation generator
use these tags for?  For example, you can devise a structure where
you list function arguments and mark up each one, but since the
descriptions next to them are just in plain English, what good would
a nicely-organized table do for a machine anyway?  When would you
ever need to collect descriptions of random individual arguments
without looking at the function docstring as a whole?

(Can we expect it to do much better than the example at
http://www.lfw.org/python/SocketServer.html ?  At what cost to the
writers and readers of docstrings do we achieve such additional power?)


[3.]

The people who are going to be writing and reading these docstrings
have already invested considerable mental effort in the syntax of
one language, Python.  Let's capitalize on the context we can gather
from that, rather than introducing an entirely independent syntax
for people to learn as well.

We can get a lot of mileage out of this.  For example, identifiers
that appear after "self." are clearly instance attributes; identifiers
that are immediately followed by "(" are function or method calls;
we can determine class and function names by introspection; and so on.
These conventions are already used by many people; all we have to do
is introduce a little payoff, and make sure we keep the conventions
very straightforward and the payoff predictable, to encourage and
strengthen the use of these conventions.

(The example uses this technique to mark up class and method names
with hyperlinks, and to make attribute names stand out.)



Finally, about the example: this is the example i showed at the end
of the Doc-SIG meeting on Developer Day at IPC8.  It was generated
automatically from the stock SocketServer.py by a program that imports
the module and introspects into its classes and functions.  We use
this script (except for the hyperlinking -- that was a recent
addition) at ILM and it works for us fairly well.  See:

    http://www.lfw.org/python/SocketServer.html

Further improvements:

  - Many modules contain documentation in # comments at the
    beginning of the module, or immediately before functions or
    classes.  The script could look for documentation in these
    places as well, if docstrings are not found.  (This is now
    done in my local copy, though it wasn't in the script when
    i demonstrated it at IPC8.)  (Side note: i've updated all the
    modules in the standard library that did this, by moving the
    # comments into proper docstrings, but this is still likely
    a good feature to have, as i bet lots of other modules out
    there use # comments instead of docstrings.)

  - The script oughta scan the module for constants as well.
    Constants could be documented with a # comment on the same
    line as the constant assignment.

  - Given knowledge of other modules in the system, the script
    could produce hyperlinks where documentation in one module
    references functions or classes in another.

In short -- let's make it so easy to write rich docstrings that people
do it correctly without even knowing that they are doing it.



-- ?!ng

"If I have not seen as far as others, it is because giants were standing
on my shoulders."
    -- Hal Abelson