[Doc-SIG] Re: directives and fields

Edward D. Loper edloper@gradient.cis.upenn.edu
Wed, 18 Apr 2001 13:24:30 EDT


> > Yes.  But from the parser's point of view, it can be anything, because
> > it doesn't know what extensions you'll be using.  Some later stage
> > (after the parser) will put restrictions on it..
> 
> Not true. I'd like to clear up this concept of directives. It's completely
> different from your proposed field concept, though not necessarily
> incompatible.

Well at least there should be rules in the "generic parser" that say
when directives end, so that a parser can ignore a directive if it
doesn't understand it.  As I understood your original proposal,
directives ended with blank lines.  I think that they should end with
a dedent back to the indent they started at, because then they can
include blank lines..

And I think that it should be *possible* to handle directives in a
second pass.  I.e., I don't think we should have any directives that
change the syntax of subsequent parts of the string, like::

   This is *emph*

   .. switch-emph-and-literal

   This is *literal*

Basically, it seems like you should be able to make a "generic" parser
which outputs a DOM tree for the formatted docstring, with "directive"
elements containing #CDATA (=character data, i.e., a string) like::

    <directive tag="keywords"> ... </directive>

Then a specialized parser could run the generic parser, and then
replace all the directive elements with some other elements..

> Inevitably, someone will want to add a feature or some behaviour to
> the reStructuredText parser which cannot be easily added through
> character-construct syntax, because:

I think we should -try- to keep feature-adding to a minimum, because
it tends to result in incompatibilities..  But that said, it does make
sense to me to have a generic extention mechanism, as long as we keep
in mind that we should be careful about not over-using it.  Also,
people adding new directives should keep in mind that "raw text should
be as readable as possible." (or whatever variant of that we decide we
like; see below).

I saw fields as being an extension mechanism, but a *much* more
constrained one than directives.  I think it makes sense to put *some*
constraints on directives (e.g., that they don't affect anything
outside themselves).  But maybe just using fields places too many
constraints.

> 1. There's no natural or obvious candidate characters or constructs
>    for syntax.
> 2. We've run out of characters to use as syntax.
> 3. The new feature or behaviour is too narrowly application- or
>    domain-dependent.

The only domain I care about is formatted docstrings.  Given, there
are subdomains of formatted docstrings (some types of
programs/programming style will make use of some features, others
not).  But I'm not sure that they vary enough that we want a nearly
arbitrarily powerful extension mechanism..

As for running out of characters to use as syntax, that's one of the
reasons I don't like *colorizing* `like this`...  

> With one construct (regexp '^\.\. ', which comes from Setext) we
> have comments, internal hyperlink targets, external URL hyperlinks,
> footnotes, and directives. Directives were proposed as a mechanism
> for adding explicit syntax that the parser can recognize, triggering
> parser extension code.

I think that my target is a much more lightweight markup language than
you're talking about.. or at least less powerful.  I really don't see
the need for most of those things in docstrings.

> Say we add an 'SQL' extension to the parser, which performs a
> database query and inserts the results.

Wouldn't this totally violate making the docstring readable?  And when
would you ever want to use this when writing a docstring??

>    .. warning::
> 
>        Don't *ever* press the `Self-Destruct` button.
>        If you do, you'll be sorry.

This could be implemented as a field.  I think that external URL
hyperlinks should be implemented with colorizing, if at all.  I don't
think that internal hyperlink targets make sense for docstrings.  I
don't think that comments are necessary for docstrings.  If you really
want, you can include a Python comment before or after the docstring.
Alternatively, comments could be done via colorizing..  

> Your field concept could be implemented using the '@' syntax as
> proposed, or using the extension mechanism. If it's important
> enough, *and* the syntax is natural enough, using the JavaDoc '@'
> syntax is no problem. The '@' syntax doesn't strike me as natural
> though.

I agree that the "@" syntax isn't very natural (except for the extent
to which it's natural simply because it's an established convention;
similar to the way that "\" is a "natural" way to escape a character).
I'd be just as happy writing fields like::

  .. param size: The number of elements in the list.

or::

  .. parameters::
     size: The number of elements in the list.

Although that seems no less readable to me than "@".  But I question
whether we want/need something as powerful as directives...

> The cornerstone of the Setext/StructuredText-like approach is that
> the raw text should be as readable as possible, even to the
> uninitiated.

I don't see how directives win here.  If anything, it seems like they
will make it harder to read by the uninitiated, given the power of
directives to use almost arbitrary syntax..

However, the idea that "raw text should be as readable as possible,
even to the uninitiated" is a *goal* of mine, but not a cornerstone.
Perhaps a cornerstone would be::

  Raw text should be readable, even by the uninitiated.

There are a lot of conflicting goals in designing a markup language,
and making it as readable as possible is by no means my most
fundamental goal.  In the case of colorizing, I believe that
colorizing should *never* be necessary to the understanding of a
docstring.. i.e., you should be able to strip away all colorizing, and
still understand what it says.  I think that the uninitiated will be
able to do that (and indeed I think it would be their first instinct).
When I first read perldoc comments, I didn't know what the C<..>s 
meant, but I ignored them, and was able to read the comments with
no trouble (well, the =.. directives were a bit confusing).

I guess that perhaps what it comes down to is that I am *not*
necessarily trying to design a Setext/StructuredText-like language.
I'm trying to design a markup language that is optimal for writing
Python docstrings.  

The problem with colorizing like *this* is that there are very few
conventions about what such colorizing means.  Indeed, I'd say that
*emph*, _underline_, and "quoting" 'of' `some' `sort` are the only
contentional ways of colorizing (well, maybe angle braces for <uris>).
And none of the quoting mechanisms have conventional "colors"
associated with them.  In my mind, the only advantage of using
`quotes` over C{curly braces} is that quotes are easier to ignore..
In both cases, the uninitiated will (maybe) know that the region is
"colorized" in some way, but not what way it's colorized in.

-Edward