[Doc-SIG] syntax vs semantics: implicit --> explicit

Fri, 30 Mar 2001 13:57:03 -0500

I've found that many of our discussions about auto-documentation
generators unnecessarily (and confusingly) mix arguments from
different levels (syntax vs. semantics, and multi-layered semantics at
that). In an effort to further make implicit explicit, and to reduce
confusion & frustration, I think it's important to separate our
discussions based on individual components. At least, we should be
conscious of 'where we're coming from' and make that more explicit.

For example, I think it's counterproductive to talk about the syntax
of a particular construct (e.g. characters used to delimit literals)
in the same breath as talking about a Python-specific concept (e.g.
hyperlinks generated from the interpretation of literals in a
Python-specific context). If the syntax is right, the semantics should
fit. Of course, the syntax discussion is at least partially being
driven by semantics. I am proposing that we be more explicit about the
motivations behind our suggestions.

On to a definition of terms, using block diagrams (useful for a
blockhead like me :-):

The parser is the basic component which takes raw text as input and
produces a data structure as output::

             +--------+
    text --> | parser | --> parsed data structure
             +--------+     (internal, e.g. DOM tree)

Depending on what we want to do with the data, we'll need output
formatters::

                        +-----------+
    structured data --> | formatter | --> formatted data
    (internal)          +-----------+     (XML, HTML, TeX, info, etc.)

A simple converter program would just need to link the two::

             +------------------------------+
             |         converter            |
             | +--------+     +-----------+ |
    text --> | | parser | --> | formatter | | --> formatted data
             | +--------+     +-----------+ |
             +------------------------------+

Now, when we get into auto-doc-generators (like HappyDoc, Crystal,
pydoc, etc.), we need to add Python-specific knowledge to the mix::

    +-------------------------------------------+
    | Python Documentation Processor            |
    |                                           |
    | +---------------------------------------+ |
    | | operating logic:                      | |
    | | knowledge of Python syntax, docstring | |
    | | conventions and rules                 | |
    | +---------------------------------------+ |
    |                                           |
    | +-----------------+        +------------+ |
    | | Structured Text |        | output     | |
    | | parser          |        | formatters | |
    | +-----------------+        +------------+ |
    | | Python-specific |                       |
    | | extensions      |  +------------------+ |
    | +-----------------+  | Python language  | |
    | +--------------+     | services         | |
    | | (potentially |     | (parser.py, xml, | |
    | |  other input |     | inspect, etc.)   | |
    | |  parsers)    |     |                  | |
    | +--------------+     +------------------+ |
    +-------------------------------------------+

I don't know about others on this list, but I would like to use an
ST-like markup language for more than just Python docstrings. I'd like
to use it for documentation of all kinds, from how-to manuals to web
pages (to books even, for crazies like me). When discussing Python
docstrings, section hierarchy features (section titles) are less
important than for writing a magazine article. This forum, of course,
is specifically geared toward Python documentation. But am I
unreasonable in thinking that this markup scheme has broader
applications? See the Setext specification
(http://www.bsdi.com/setext/) for its history; basically, it was used
for a pre-web electronic newsletter, TidBits, whose texts were quite
long.

(Last year I wrote a chapter on Python for Wrox Press' "Professional
Linux Programming". I would have been much happier using a complete
ST-like markup than futzing around in MSWord.)

I believe that the operating logic/rules/conventions ought to be
separated conceptually and code-wise from the parser. The parser
itself should be separated into generic and Python-specific parts.
These things should not be tied together, at least not so strongly.

Opinions? Flames? I've got my asbestos suit on!

Thanks for reading my idle ramblings!

/DG