[Doc-SIG] suggestions for a PEP

Tony J Ibbs (Tibs) tony@lsl.co.uk
Tue, 13 Mar 2001 10:43:09 -0000


Tavis Rudd wrote:
> 2- a FORMALIZED version of structured text should be used for inline
> formatting.  There's no need to repeat the justifications here.
> The final version of structured text should include a
> facility for storing meta-data in a field format that is easily
> identifiable to both the human eye and the parsing tool.
> (e.g. authors, version, keywords, spam)

OK. Historically (yes, Doc-SIG has been around long enough to have
history), this is where efforts start to founder. The cycle goes:

	* long quiet period
	* flurry of agreement that we want an ST variant
        and some near agreement on what we want
	* someone says they're starting an implementation
	* people (ahem - in the past including me) start to
	  discuss the *formalisms* that need to be enforced
	  on people to allow information to be automatically
	  extracted from docstrings
	* flurry of argument
	* list goes quiet for long period

Do you see the problem?

What I am working towards (and I shall have to add this to the PEP I've
just started work on), we need to phase this carefully:

	1. Decide on STpy (or an ST variant), with minimal
	   extensions
	2. Produce an application that parses it
	3. Get it accepted, and get people using it
	4. THEN, once we've changed the culture, spring
	   our wonderful scheme for semi-formal markup
	   on them, that allows them to extract special
	   information.

Two quick comments before you jump up and down at that:

a. I don't actually believe that you are going to get most Python
programmers to DO semi-formal markup for you. I *do* believe there's a
good chance we'll get many people to write at least a little human
readable text. So guess what I'm after...

b. If we provide ST<whatever> to markup said text, and it is easy to
use, then most people will use it. And then, heh presto, we'll magically
get *some* added value towards data extraction (at least, for instance,
the sort of function signatures that IDLE and Pythonwin will present in
a tool-tip).

c. (ok - three things) you're more likely to win the "and please add
more markup" battle *after* people have gotten used to the markup.

d. (oh, I give up) see also
   http://www.tibsnjoan.co.uk/STpy.html#taggedparas
   which describes things like::

	Author: Guido van Rossum

   and::

	Arguments:
	   fred -- this is a useless name for an argument

   Although it doesn't go into detail, the idea *is* that there should
be some "requirement of structure" for such tags - that is, that
arguments should be followed by a descriptive list, and so on. This is
unlikely to be enforced early on, and may only be so via the DTD for the
final DOM tree, but the idea *is* there. And although it says there that
it should be left for second phase implementation, the start of support
is already in docutils.

> 3- no changes should be required to the python parser

of course not.

> 4- the module's namespace should not be polluted and it's memory
>      requirements should not be inflated by use of inline
> documentation

erm - sorry? if you want your inline docs to be available to a browser
(and I *do*) then they've pretty well got to be around somewhere!

> 5- therefore, the existing __doc__ docstrings should be used
> for very short
>      synopyses, and extended documentation that is discarded at the
>      the byte-compile stage should be written in string
> literals that appear
>      immediately after the existing docstrings. These extra
> string literals
>      would be written in ST, while the __doc__strings would
> be in plain text.
>      These two forms of API docs should complement and not
> duplicate each
>      other.

Sorry - I can't be bothered to reformat that - blame Outlook <fx:spit>
if you like.

I disagree strongly. Keep it simple. A docstring is a docstring is a
docstring. And in it goes the documentation for the entity. Using extra,
magical, string literals is uncool. And anyway, I've been writing
docstring in something close to ST (without thinking about it) for
donkeys years - we WANT markup in docstrings! That's why we're doing
this - not to impose on other people, but for ourselves!

>      See the example module attached to this message.
>
> 6- the documentation parsing tools should be capable of
> producing output in
>      many formats (manpages, plain text, html, latex, for a start),

See HappyDoc - that's not a problem. But the *parsing* tools don't
produce output - the output tools do (docutils is a parsing tool - it
parses docstrings. It happens to have a not-very-sophisticated
docstring-finder built in (but both pydoc and HappyDoc do better jobs,
in different ways, 'cos that's what they're for), and it happens to have
a not-very-clever HTML outputter (but see the previous comment), but
those are just for testing and proof-of-concept and examplitude...)

> 7- the doc parsing tools should not need to import and run
> the module to
>       produce it's documentation (for security reasons alone)

Debatable, and depends on what you're after. pydoc does (it's part of
its requirements if it's to be used as a "help" facility) and HappyDoc
doesn't. That's to do with the *other* tools, not to do with the
docstring tool.

Think modules (or packages if you prefer) - a package to understand
docstring contents, and render them in a form other modules can use, but
that package need not know how to find them or what to do with them
after unpicking them.

> 8- module Library Reference documentation should also be kept
> in the same
>      file as the module source.  It should compliment the API
> docs with
>      examples, extended discussions of usage, tutorials, test
> code, etc.,
>      but should not duplicate the API reference material.

I've had that argument.

> 9- the Library Reference docs should be written in string
> literals, as with
>      the extended API docs proposed in pt. 5, but there
> should be a prefix
>      token such as """LIBREF:  at the start of each chunk to
> signal to the
>      doc tools that the following text is not part of the API
> ref.  The token
>      would allow this documentation to be split up into
> chunks that can appear
>      anywhere in the source file (a la perl's POD).

Erm - yuck. BTW, if we *want* POD, and that's how POD does it, then we
should *adopt* POD (in which case, after adoption, it is no longer
yuck). But I'm agin it, for reasons argued in the past.

> 10- the Library Reference documentation should also be
> written in ST as
>       using LaTeX here would force the module author to learn
> yet another
>       mark-up language, require the documentation user to
> install yet another
>       processing tool (although this isn't an issue on
> Linux), and would place
>       too much emphasis on the separation between the API and
> library
>       reference docs and discourage synchronization as the
> module evolves!
>       The same argument applies to maintaining the status quo
> of external doc
>       files.

Markup languages (actually, LaTeX is a TeX macro language, so is
*technically* still a typesetting language, with added markup
potential - this is important to understanding it properly) are just
languages. It does the soul good to learn another one, just as it does
with programming languages. I feel I need an Alex Martelli argument
here, so please imagine one for me.

>       Any extra meta-data that is needed for proper indexing, etc.
>       (to meet Guido's concerns) should be included as fields
> in the string
>       literals as is done in JavaDoc (but not neccessarily
> with that syntax).

The question of referring to Python data (e.g., #module.fred#) has been,
shall we say, heavily deferred for the moment, 'cos it's contentious.
Ka-Ping has *very* clever approaches to semi-automating it, and I have a
sneaky feeling that if he keys it off '#..#' strings, many of my own
past objections would go away (erm, have you bothered to read this far,
Ka-Ping?). But there are also proposals for extra markup (such as
'^#..#'). It's another of those things best left to second-phase, in my
opinion.

> What do you think?

I think that your comments make sense in context, but (not your fault)
are often going over ground that has been trodden before. I *heavily*
feel that we need to get a *useful* (but not too ambitious) candidate
implementation out the door before we get bogged down again, and if
people hadn't started up all this discussion this week I'd be well on
the way to it by now (although, actually, writing a PEP *is* a very good
idea, and should have been done earlier).

> p.s. Other issues to consider:
> - caching of documentation so it doesn't have to be regenerated
>    every time it's used

Tool issue.

> - documenting Packages

Goes in __init__.py - this sort of stuff just falls out naturally, I'm
afraid, which is why we know Guido is a DGLD (damn good language
designer)

> - inheriting documentation (Edward Loper's idea)

dosctrings inherit like any other value, surely?

> - hiding API docs for __privateInternals (ditto)

Tool issue - given a DOM tree one can prune it, and anyway adding a
qualifier to docutils to say "don't collect *these* sorts of thing" is
trivial (but there are *so many* trivial things, and they are so
*obviously* trivial, that one has to leave some of them until later on)

> - documenting extensions in other languages

A big issue - leave it alone for now!

> - comments within the markup language

Why? I used to argue for these, Eddy convinced me not to. You'll have to
come up with a convincing argument of *exactly* why you need these...

Tibs - I'm sorry if any of that appears brusque, but I've got urgent
paid work to do as well, so have to type fast...

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
Give a pedant an inch and they'll take 25.4mm
(once they've established you're talking a post-1959 inch, of course)
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)