[Doc-SIG] suggestions for a PEP

Edward D. Loper edloper@gradient.cis.upenn.edu
Mon, 12 Mar 2001 20:49:08 EST


I'd be happy to write up a PEP, but I'm curious what things people
think it should include.  In particular, which of the following
should it include:
  1. The definition of a particular markup language?
  2. Specification of the *semantics* of a markup langauge (as 
     opposed to just its syntax)
  3. Specification of what is "appropriate" to put in a docstring?
  4. Specification of the "intended semantic content" of various
     "slots" like docstrings and strings immediately following
     docstrings?
  5. Specifications of tools for docstrings?  Possibly of tools
     that would eventually be included in the standard library?
  6. whatever else you can think of..

Some of these questions require further work to answer (such as 1,
where no one can currently give a good definition of any version
of StructuredText)..  Others will probably involve some agreement
by the community..

I think it's important to distinguish the tools that process 
docstrings from any attempt to define what should go *in*
docstrings (or similar places), whether that means what type
of information or what type of markup.  The PEP should address
what goes *in* docstrings, but shouldn't necessarily have much
to say about the tools that process the docstrings.  (Leave
those for a standard library extension (do those get PEPs?).

Anyway, I wanted to comment on some of what Travis Rudd wrote:
>> 1- module API documentation should be in the same file as source

I will assume that "API" documentation is what I was advocating
earlier.  I.e., a clear, concise, unambigous *definition* of 
a Python object, such that you can tell exactly what is guaranteed
by the object just by reading the definition.  If that's *not*
what you mean, please say what you do mean.

>> 2- a FORMALIZED version of structured text should be used for inline
>>      formatting.  There's no need to repeat the justifications here.

Yay! :)  I would add "safe" as well, but I'll argue that another
day..

>>    The final version of structured text should include a facility for 
>>     storing meta-data in a field format that is easily identifiable to both 
>>     the human eye and the parsing tool.
>>     (e.g. authors, version, keywords, spam)

My favorite suggestion is to just use top-level description list items
and top-level headings with a single level of description list items
to store meta-data.. and to have "reserved" keys for description lists
(in these contexts).  E.g.::

   def primes(n):
       """
       Return a list of all the prime numbers from 1 to n, exclusive.

       Parameters:
           n -- The upper limit for the prime numbers to return.  A
                prime number will be returned if and only if it is
                less than 'n'.

       Types:
           n -- int

       ReturnType -- 'list' of 'int'
       Version -- 1.0
       Author -- Edward Loper
       See -- #other_primes#
       """
       ...

Also, I think that *all* of these special marked meta-data fields must
be optional.  (Of course, if the user wants to use a program that
checks to make sure that ReturnType is defined for every method, they
can.. but it's not required in general).

>> 3- no changes should be required to the python parser
>> 
>> 4- the module's namespace should not be polluted and it's memory 
>>      requirements should not be inflated by use of inline documentation
>> 
>> 5- therefore, the existing __doc__ docstrings should be used for very short
>>      synopyses, and extended documentation that is discarded at the 
>>      the byte-compile stage should be written in string literals that appear
>>      immediately after the existing docstrings. These extra string literals 
>>      would be written in ST, while the __doc__strings would be in plain 
text.
>>      These two forms of API docs should complement and not duplicate each 
>>      other.

I think that there definitely *are* instances where you want to get at
these strings from within python, esp. if you're using the interpreter.
One thing I really loved about python when I was learning it was that
I could get decent help on just about anything very easily.  I therefore
propose the following:
   1. STdoc strings appear after __doc__ strings, as I said before.
   2. For now, these strings are thrown away by the compiler
   3. At some future date, the compiler could be modified so that, at
      user option, it would produce ".pyd" files as well as ".pyc"
      files.  These contain all the STdoc strings from the file, and
      can be accessed via the interpreter somehow.  Python would *not*
      format them, it would just copy them.  Maybe create a dictionary
      from identifier name to string, and pickle it.

>> 6- the documentation parsing tools should be capable of producing output in 
>>      many formats (manpages, plain text, html, latex, for a start), 

definitely, although this may take some time to implement.  I would add
"dynamic navigation of docs from within the python interpreter" to 
the list of possible "outputs."

>> 7- the doc parsing tools should not need to import and run the module to 
>>       produce it's documentation (for security reasons alone)

Leaves open the question of what to do with C extensions, etc...

>> 8- module Library Reference documentation should also be kept in the same
>>      file as the module source.  It should compliment the API docs with 
>>      examples, extended discussions of usage, tutorials, test code, etc., 
>>      but should not duplicate the API reference material.

Tutorials, examples, etc. do *not* belong in the same source file.
That makes it much to hard to work with the source file.  How much
performance hit would we take if we turned *every* "standard" module
into a package?  That way we could have string.__tutorial__ etc or
whatever, if we wanted to..

Among other (very good) reasons, this would mean that only the writer(s)
of a module can write tutorials, discussions of usage, etc. for it..

>> 9- the Library Reference docs should be written in string literals, as with
>>     the extended API docs proposed in pt. 5, but there should be a prefix   

>>     token such as """LIBREF:  at the start of each chunk to signal to the 
>>     doc tools that the following text is not part of the API ref.  The 
token
>>     would allow this documentation to be split up into chunks that can 
appear
>>     anywhere in the source file (a la perl's POD).

Possibly in a different file..  I find Tibb's arguments pretty convincing..

>> 10- the Library Reference documentation should also be written in ST as 
>>      using LaTeX here would force the module author to learn yet another 
>>      mark-up language, require the documentation user to install yet 
another
>>      processing tool (although this isn't an issue on Linux), and would 
place
>>      too much emphasis on the separation between the API and library   
>>      reference docs and discourage synchronization as the module evolves!
>>      The same argument applies to maintaining the status quo of external 
doc
>>      files.

Sounds reasonable to me..  Anyone want to try taking a few modules and
converting their docs to ST, just to see what issues come up?

>>       Any extra meta-data that is needed for proper indexing, etc. 
>>       (to meet Guido's concerns) should be included as fields in the string 
>>       literals as is done in JavaDoc (but not neccessarily with that 
syntax).

I think all this information should appear in description list items
with reserved keys, or under reserved heading keys, as I mentioned
above..  

>> - caching of documentation so it doesn't have to be regenerated 
>>   every time it's used
Seems like an implementation detail of tools, not part of a PEP
describing what should go into docstrings.

>> - documenting Packages
There definitely need to be provisions for that.

>> - inheriting documentation (Edward Loper's idea)
Well, really javadoc's or someone before them..

>> - hiding API docs for __privateInternals (ditto)
This seems like an implementation detail for tools..  (But I agree
it *should* be implemented on most tools :) )

>> - documenting extensions in other languages
Much easier if we can import modules.  But I guess safety's important.
Oh well.

So I'll try to write up a PEP when I get a chance.  It sounds like
Tibbs might write a proposal too.  I think that Tibbs and I seem
to have similar views on a lot of issues, so if we want diversity
in our PEPs, maybe someone else should work on one too. :)  (Of 
course, this sort of seems like redundant work, but I guess it's
for the best or something)

-Edward