[Doc-SIG] Some random thoughts

Sun, 05 Mar 2000 11:22:00 GMT

Things have been a little quiet on here recently once again confirming the
doc-sigs yoyo nature; I sometimes wonder if people here are on a 3 month
hibernation cycle <wink>.

Anyway, I'd like to share a few thoughts with you based on my recent
experiences implementing my Crystal system: take them for what they're
worth. These are quite long, and maybe irrelevant; the recent massive debate
on hyperlinks etc came at a point where coding was more important to me
than reading so maybe some of these have been covered and maybe I'm just
being dumb...

For those of you who were asleep / bored / had server problems etc, Crystal
is a javadoc / pythondoc type system which you can download from:

  http://eh.org/~laurie/comp/python/crystal/

There is also some sample output on the page to give an idea of what output
it produces on given inputs. Both of these are a little out of date (and
please bear in mind this project is in the *very* early days - it's not user
friendly although probably easier to get running than pythondoc!), but they
give the general idea... I badgered my university to let me produce Crystal
as my undergraduate degree project, so that may put a few things into
perspective in the following...

OK, the rest from here on is definitely all IMHO.

 StructuredText

One of the few things I did pull out of the recent debate was that it has
been decreed that inline documentation should be at least based on
StructuredText. That made me do a little U-turn as I had been working on
something a little different; fortunately Crystal works on the idea of
plugins so I just made a StructuredText compatible plugin[0]. Initially I
wanted to use Jim Fultons module, but this turned out to be a bit of a
problem for the following reasons:

  * It seems to be geared up for subclasses returning strings (I needed a
      recursive data structure, not a string representation)
    In fact, realistically, the implementation is set up with only HTML in
      mind
  * There's no real documentation for the implementation
  * The implementation is *very* hard to understand if you haven't watched
      it evolve

So after a few hours attempting to munge the StructuredText module into
something I could use, I gave up and coded my own "compatible" version based
on the rules in the module and:

  http://www.zope.org/Members/millejoh/structuredText

I actually quite like the concept of StructuredText, but in my opinion,
there are some problems with the current specification. The major factor is
that the "Specification" isn't really that specific; one could argue that
"the implementation is the specification", but the current module is a
pretty hard implementation to understand. Some specific points include (in
no real order):

  * There is no protocol for escaping characters (ouch)

  * There is scope for ambiguous doc strings in the current spec. eg:

    """
    Here's some example code::

        def __init__(self):
             pass

        My next heading

    The start of a paragraph
    """

    Is "my next heading" part of the example code or a header to the
      following paragraph?

    Interestingly, on the zope.org page referenced above, at one point in
      the raw_text, there is:

    """
    Notes

      <>

    Including Structured Text in DML
    """

    And - unexpectedly - the notes header completely disappears in the
      output. Undocumented behaviour?

  * Are lists allowed to be recursive?
     * Should this work?
       * Is this out of this world?
    My implementation allows this (I find it useful)

  * Ordered lists are ill defined: can they go in any order, or should it
      be like the <OL Start = > HTML?

  * It is unclear whether the definition in a definition list should be
      allowed to take styles. eg is:

    """
       'code' -- description
       *emph* -- description
    """

    Going to work as one expects? In my implementation it (in the
      development version) does

  * The example code protocol is crufty & non-overridable. What should:

    """
    ...so for example:

       * element 1
       * element 2
    """

    do? C++ programs might get caught out with the actions of '::' in
      StructuredText <wink>

  * Forcing anything between ' ' into <Code> seems particularly clumsy; * has
      a good history of being an emphasis effect and ** is a cunning
      extension to that, but ' ' seems unnatural

    (Incidentally the implementation looks like <Code>something</Code> is
    valid but the spec mentions nothing)

  * From a purely Python perspective, having _ _ as the underline protocol
      tends to cause __init__ type method names to come out somewhat
      unexpectedly. But that's not StructuredTexts fault <wink>

  * Should styles nest? So is *this **going** to work* ?

The first point is the most crucial. Without an escape character protocol,
one can come up with all sorts of unpleasantness from the StructuredText
implementation.

I don't know what the definitive answer to some of the above problems is,
and maybe "the" implementation solves them; but from my point of view, the
current definition might get us into trouble in the long term by allowing
implementation specific things to accidentally work or fail. Python itself
gained much I believe from gaining a second implementation in JPython
forcing standardisation of many things.

My opinion is that what is currently the complete specification for
StructuredText is basically about right for an "overview", but that there
then needs to be a further section where things are explained in complete
detail to resolve the ambiguities and problems I've mentioned.

  Current doc strings

One thing I don't think any of us are sure of is what we should do with all
the doc strings we've already got. Thankfully my experience is that most doc
comments are very nearly in a StructuredText compatible format anyway.
Certainly, running Crystal over the 1.5.2 standard library produced fairly
good output with no alterations. As a test, I made a few quick minor
alterations to some files to bring them into line with StructuredText:
thankfully I don't see that doing it to even a large number of files is
actually going to be that difficult or time consuming.

Here are some things that tend to cause complications in current doc
strings:

  * Many look like:

    def __init__(self):
    """The first line

       The rest of the body

       Blah blah
    """

    What exactly is the indentation of that whole thing? To the human eye
      the answer is 8. In my implementation the above *doesn't* do what you
      expect because the first line gets an indentation of 0 and the rest
      an indentation of 8 (meaning the body is a sub paragraph of the rest):

    def __init__(self):
    """
       The first line

       The rest of the body

       Blah blah
    """

  * Sometimes lists and so on look like this:

    """
    - element 1 gets
    split over two lines
    - and so does
    element 2
    """

    whereas the way I read StructuredText (and the way I prefer things)
      means it should be:

    """
    - element 1 gets
      split over two lines
    - and so does
      element 2
    """

  * Example code sometimes isn't properly flagged (but see my earlier point)

As you can see none of these is exactly serious. It is my opinion that it is
*not* necessary to simply output all current doc comments in a monospaced
pre-formatted font: aesthetically this is awful, and practically I think
I've demonstrated with Crystal that it's not really necessary.

Unsurprisingly, I also have some thoughts about how to construct the "right"
tool for the job, but I think this little lot is enough for one post, so
comments etc are welcome.

Laurie

[0] So, yes, you can plugin different parsers for different inline
      comment syntaxes. You want POD? Fine. JavaDoc? Fine. So long as
      someone codes a plugin, this is one less thing hardcoded into
      the system
-- 
http://eh.org/~laurie/