[Doc-SIG] Evolution of library documentation

Tony J Ibbs (Tibs) tony@lsl.co.uk
Mon, 12 Mar 2001 10:44:14 -0000


Ka-Ping Yee wrote:
> [resent with individual cc addresses, since mail.python.org is down]

Is it? OK - everyone's going to get two copies...

> The introduction of pydoc places more emphasis on docstrings in the
> source code.  I think this is generally good, since keeping the
> documentation close to the source makes it more likely to be kept
> up to date.

Agreed in so far as it goes.

> However, it also produces the potential for duplication
> of effort in maintaining both the docstrings and the LaTeX file for
> the library reference.

Hmm. I've had this argument before.

Maintaining two different things is maintaining two different things. If
the docstrings are sufficient (and we now have tools to extract them and
format them), then well and good. But if they are not, then a different
sort of document is just that - different.

> The LaTeX documentation seems to be motivated by the richer metadata,
> the greater control over formatting, and the ability to present a
> long tutorial or detailed explanation.

Yes. Although I'm not worried personally if it's LaTeX or (for instance)
DocBook XML.

> At the Python conference, a small group of us

Ah, the Spanish Inquisition. Which is why I didn't expect it (sorry -
not *really* getting at people - well, maybe just a little)

> discussed the possibility of merging the external and internal
> documentation; that is, moving the library reference into the
> module source files.

Hmm. I'll rant about this a little later on.

> It would no longer be written in TeX so that you wouldn't have
> to have TeX in order to produce documentation.

Not *necessarily* a bad goal (although I would point out it's
*significantly* easier to "have TeX" than, for instance, to "have CVS",
which one is also required to have to do development with modern Python
(a *serious* problem for some of us).

> This would address the duplication problem and
> also keep all of a module's documentation in one place together with
> the module.

Now, if you said "package" I'd be happy, but since it's "module", I'll
gripe.

> To avoid forcing you to page through a huge docstring
> before getting to the source code, we would allow a long docstring to
> go at the end of the file (or maybe collect docstrings from anywhere
> in the file).

Aagh! No, sorry, my problem wouldn't be with paging (although that *is*
a problem - and why is the end of the file so different than the
front? - I page from both ends, depending on context!).

Source files are for source code. I want to be able to *treat* them as
such. It is quite possible for a two page source to have ten or more
pages of documentation associated with it. That does *not* belong in the
same *file* as the source - if someone *wants* to associate them
closely, the correct way to do it is with a *package*.

Let's see if I can explain this a bit better.

Files are a useful way of organising data, but good practice doesn't
stuff things into one file when they are better organised as two or
more. That's why we split source code up into multiple files - a good
language like Python allows and encourages this, so that even if one
only has one entry point into a package, the writer can still choose to
split it up logically into multiple "internal" files. Keeping file size
down also has advantages - it makes it easier to navigate the file both
"physically" (with an editor) and "conceptually" (remembering what is in
the file and why). It's related to the "don't let functions/methods get
too big" idea.

Files are also, in many filesystems, *typed*. That is, the file "name"
has an indication of what is *in* the file. Using this information can
be a big win.

Docstrings are for inserting "point" documentation, targeted
documentation that relates to the particular object the docstring is
attached to. This is a Good Thing, and one of the most important
additions to Python over the last few years. The key idea here is that
targeting - the documentation is in the docstring (and thus in the file)
because it belongs *with* what it is documenting.

Tutorial, reference and other "grander scope" documentation relates to
the source code as a whole. So the "object" it belongs to is the module
or package (or perhaps part of it). As such, one *might* argue that, for
a single module package, it belongs in that module's docstring. But one
then has to decide which of the sorts of documentation "belongs" there,
since there is only one slot. I argue for it being whatever the module
writer wants (!), but normally/notionally an overview to allow a source
code reader (or person browsing with an IDE) to get a handle on what is
going on.

   (That's an important point - docstrings must be suitable
    for browsing with an IDE.)

*Because* "grander scope" sorts of documentation relate to the package
or module as a whole, I think they deserve a separate file. OK, so if
you want it closely coupled, that makes more things packages. Tough. A
package can "look like" a module if it wants.

Also, *because* one might have more than one sort of "grander scope"
documentation for a module/package, you will have to consider
*supporting* more than one. Difficult if it is "just" a string tacked on
the end.

> That leaves the metadata and formatting issues.  When i suggested this
> idea (of merging in the external documentation) to Guido, he was
> initially against it.  He was very concerned about the loss
> of information
> in the TeX markup.  In order to even consider switching formats, he
> requires that we preserve as much metadata as possible from the TeX
> docs (so that, for example, we can still generate a useful index).

I agree with Guido (gosh!) on this. My reasons are based on long term
use of documentation tools, and also on good programming practice, as
well as gut instinct (which is, of course, also based on those things!).

The reason for adopting ST (or some variant) for markup in docstrings
is, basically, because it is acknowledged that many people will not
create docstrings with more markup than that, or with more obtrusive
markup than that.

I'll say that again slower - there are two reasons for ST (or similar)
in docstrings:

1. People *will not* markup heavily (we cannot make them do it, they
will not do it), so we need to specify a markup that doesn't have a high
learning curve, and that doesn't have many *ways* of marking up

2. People will not use an "obtrusive" markup, like TeX, XML or HTML,
because they perceive it as "difficult to read".

Those two are, of course, different faces of the same thing.

Now, *if* we are to retain all of the markup meaning that the TeX
documentation has, we will *have* to have more complex markup. ST is
predicated on the idea that it is very simple to read (it is not
accidental that it looks very much like email). STpy is already
straining at that a bit by introducing '#...#' (which we think we need).
And I am not convinced that there is an ST-natural way of quoting a
single quote as a literal character (which is the sort of thing one
*has* to be able to do for proper markup of a detailed text on some
issues).

Thus, despite the ability to write a book (the Zope book, for instance)
in ST<variants>, it is required to stay not much more complex than it
is, or people won't use it.

Worse, if one tries to continue using "simple" markup in ST, one is
going to end up with strained analogies, and with almost any
non-alphanumeric character having a special meaning. Yuck (can we say
Perl?).

The obvious way round that is to start doing, well, markup - for
instance, '@class(..)'
or somesuch (like Pod, I think? - or GNU texinfo). In which case we're
inventing our own little markup language again, with none of the reasons
for doing it that went into ST.

And I for one reckon that we probably don't have a
Guido-of-the-markup-languages hanging around on our list (it's
statistically unlikely). Indeed, if one *needs* markup, then the obvious
thing to do is to steal someone else's (I for one don't care much
*which* markup language one steals - TeX, Pod, texinfo and DocBook XML
all have their advantages and disadvantages. I thought we'd delegated
that decision to Fred Drake).

> But i still think that getting all the docs together in one place is
> a goal worth at least investigating.

Depends on how tightly couple "one place" is - the same *directory*,
maybe. The same *file* - naff idea.

> So i have gone through the TeX
> files in the Doc/lib directory and extracted a list of all the TeX
> markup tags that are used there.  Here follows my list; i
> have attempted
> to categorize the purpose of the tags by hand.

At which point I think I rest my case - there are *lots* of these.


I sincerely hope that we don't adopt this proposal as stated. I
*wouldn't* object to a proposal that said that documentation source
files should (maybe) live with source source files (although people who
don't want to download the documentation might well object!). And I am
open to the formatting language that is used for such "grander scope"
documentation, although I think we should not be trying to invent our
own (I suspect that a DocBook XML variant is probably what we want,
since it is a skill that seems to have application elsewhere).


I've *got* to go and do paid work now.

Tibs

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"How fleeting are all human passions compared with the massive
continuity of ducks." - Dorothy L. Sayers, "Gaudy Night"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)