[Doc-SIG] scope of the parser

David Goodger goodger@users.sourceforge.net
Sat, 17 Nov 2001 01:22:40 -0500


Alan Jaffray wrote:
> Since the rules for which references are associated with which
> targets are defined by reStructuredText, shouldn't the parser
> explicitly state which target is to be used for each reference?
> Ditto for footnotes, inline directive references, etc.  It seems
> misplaced for an output formatter to be dealing with details of
> autonumbering, ambiguous refnames, etc.

This is what I call "linking" (just like compilers... I remember
those... glad I don't have to deal with them any more).  The writer
doesn't do the linking.  The parser can't do the linking either,
because it may not have complete information during (or at the end of)
any one run.  It's the Reader component that does the linking.
Linking is one example of a transform.  Transforms will be controlled
by the Reader component.

For example, when processing Python source docstrings, there will
probably be a lot of cross-references between individual docstrings.
Each docstring is essentially a separate and independent document.
The "Python Source Reader" extracts the docstrings, sends them through
the parser, knits the results together into a single coherent doc
tree, and *then* the linking can take place.

Looks like it's time for another... DPS Components Diagram!

Here's my current thinking::

           +--------+                   +--------+
           | READER | ----------------> | WRITER |
           +--------+                   +--------+
             /    \                       /    \
            / .... \                     /      \
           / /    \ \                   /        \
    +--------+   +------------+   +--------+   +------------+
    | PARSER |   | transforms |   | sylist |   | deployment |
    +--------+   |            |   +--------+   +------------+
                 | - docinfo  |       (?)
                 | - titles   |
                 | - linking  |
                 | - lookups  |
                 | - etc.     |
                 +------------+

UPPERCASE names are major DPS components; only one of each type is
used per document.  They are chosen either by the user or based on the
input.  Lowercase names are groups of common services used dynamically
as required.

The dotted line between the parser and the transforms indicates that
the choice and order of transforms used will depend on the parser as
well as the reader.  Some transforms used on doc trees generated from
reStructuredText will not be required for doc trees generated from
other markup.  In addition, the transforms used will depend on the
reader.  Some transforms will be used only by the Python Source
Reader, others by the Standalone Document Reader.  There will be some
overlap as well.

I believe that if required at all, "stylists" will be specific to each
Writer.  They'll transform documents into different layouts.  The
"deployment" services will comprise at least: output to a single file,
output to multiple files in a directory structure, and output to
objects in memory.

A lot of this is still up in the air, until concrete implementations
of each part are complete.

> Meanwhile, since the meaning of directives and the set of meaningful
> directive names is *not* defined by reStructuredText, shouldn't the
> rST parser output most directives (including unknown ones)
> untouched?

They're not necessarily defined in reStructuredText *spec*, but they
*are* parser constructs; directives are taken care of by the parser,
and only by the parser.  Directive-handling code must be present at
parse time; if the parser enounters an unknown directive, an error is
generated.  Directives must transform their data & blocks to doc tree
elements at parse time.  If any further processing is to be done down
the road, the directive generates specialized elements which can be
processed by a transform or at some later stage.  But once it leaves
the parser, there's no longer a "directive" as such.

For example, say you have a directive "TOC" which is meant to generate
a table of contents.  The table of contents typically appears at the
head of a document, but you can't generate it until the entire
document has been processed.  So we insert a ".. toc::" directive
where we want the TOC to be, and the TOC directive generates a "<toc
/>" placeholder element.  A transform further downstream could
recognize the "<toc /> element, generate and substitute a full table
of contents in place.

Such directives would be two- (or more) step processes.

Perhaps I should remove the "directive" element from the DTD and
dps/nodes.py.  I'm only using it for testing right now.  All true
directives generate proper doc tree elements.

Yes, I think I will remove it.  Its presence is confusing at best.

> It seems advantageous to be able to decouple directive
> processing and implementation from reStructuredText parsing and
> implementation.

We can't do that, because directives are specifically allowed to reuse
the parser for their own purposes.  For example, the admonition
directives (note, caution, danger, etc.) recursively generate and run
a parser state machine to do most of their work for them.

-- 
David Goodger    goodger@users.sourceforge.net    Open-source projects:
 - Python Docstring Processing System: http://docstring.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net
 - The Go Tools Project: http://gotools.sourceforge.net