[Doc-SIG] RE: directives and fields

Goodger, David dgoodger@atsautomation.com
Thu, 19 Apr 2001 11:22:52 -0400


[Edward D. Loper]
> Well at least there should be rules in the "generic parser" that say
> when directives end, so that a parser can ignore a directive if it
> doesn't understand it.  As I understood your original proposal,
> directives ended with blank lines.  I think that they should end with
> a dedent back to the indent they started at, because then they can
> include blank lines..

>From the reStructuredText spec, first draft:

"""
A comment/directive block is a text block:

- whose first line begins with '.. ' in column 1,
- whose second and subsequent lines are indented relative to the first, and
- which ends with a blank or unindented line.

...

Actions taken in response to directives and the
interpretation of data in the directive block or subsequent text block(s)
are directive- and implementation-dependent.
"""

I would only change the third list item to 'which ends with an unindented
line'.

> And I think that it should be *possible* to handle directives in a
> second pass.

Sure, if that's what the extension wants to do. The extension itself is
called during parsing. If it's tied to a post-parse process, that's its own
business.

There are essentially two types of directives: extensions, which apply to
their blocks only; and plugins, which may change the behaviour of the parser
for some defined part of the input (may be for the adjacent text block, may
be globally). Justification for plugins: it would be useful to modify the
parser's behaviour on the fly, without having to subclass. For example, a
'fields' plugin could add support for the '@' syntax, allowing
experimentation & testing. Kind of like the 'from __future__ import' hack.
;->

> I.e., I don't think we should have any directives that
> change the syntax of subsequent parts of the string, like::
>
>    This is *emph*
> 
>    .. switch-emph-and-literal
> 
>    This is *literal*

Of course, no such directive would be part of the standard package. Only a
lunatic would play games like this. But it would be a great way for people
to play with alternate syntax.

> Basically, it seems like you should be able to make a "generic" parser
> which outputs a DOM tree for the formatted docstring, with "directive"
> elements containing #CDATA (=character data, i.e., a string) like::
> 
>     <directive tag="keywords"> ... </directive>
> 
> Then a specialized parser could run the generic parser, and then
> replace all the directive elements with some other elements..

If the extension/directive wants to do this, fine. But what if it just wants
to wrap the normal behaviour of the parser with a new tag?

> The only domain I care about is formatted docstrings.

That's a big enough domain with enough controversy to make the feature
necessary. See the archives. See this discussion! :-) It's been going on for
years, you know.

> As for running out of characters to use as syntax, that's one of the
> reasons I don't like *colorizing* `like this`...  

Then implement a POD-like language or a JavaDoc-like language or whatever.
This is clearly the dividing line: do you "buy in" to the
Setext/StructuredText concept or not?

> I think that my target is a much more lightweight markup language than
> you're talking about.. or at least less powerful.  I really don't see
> the need for most of those things in docstrings.

Again, read through the archives. Everyone has different opinions, everyone
wants different levels of control. If you don't want to use a particular
feature, don't. But someone else does. Please don't limit *me*.

It is my opinion that incomplete, minimal markup schemes are doomed to
failure, because *your* minimal set of features doesn't match *my* set or
*anybody else's*. At least at the discussion level. ;-)

> > Say we add an 'SQL' extension to the parser, which performs a
> > database query and inserts the results.
> 
> Wouldn't this totally violate making the docstring readable?  And when
> would you ever want to use this when writing a docstring??

Just an example, not a serious proposal. C'mon, lighten up!

> >    .. warning::
> > 
> >        Don't *ever* press the `Self-Destruct` button.
> >        If you do, you'll be sorry.
> 
> This could be implemented as a field.

Then fields can't be restricted to the ends of docstrings -- I want a
warning in the middle! And what do fields *do*? Seems to me they're simply
descriptive, not functional. Maybe they are all we need, but please come up
with a more complete description!

> I think that external URL
> hyperlinks should be implemented with colorizing, if at all.

They're definitely required. I used readability as the overriding criterion
in making that decision. Which is more readable?

1. A hyperlink in StructuredText, inline::

      I love using the "Python":http//www.python.org programming language!

      (The URL has to be stuck next to the reference, whether it flows or
      not. The raw text looks very different from the processed!)

2. A hyperlink in reStructuredText (based on the Setext style), indirect::

      I love using the Python_ programming language!

      (Note that the URL can be anywhere: next to the reference, at the 
      end of the section, or at the end of the document. And the URL can be
      referred to multiple times: Python_.)

      .. _Python: http://www.python.org

> I don't
> think that internal hyperlink targets make sense for docstrings.

This comes back to the semantics or usage of docstrings, something that I'm
trying to avoid. How long can a docstring be?

> I don't think that comments are necessary for docstrings.  If you really
> want, you can include a Python comment before or after the docstring.

Comments are a freebie from the '.. ' syntax. Not necessary, but useful.

> Alternatively, comments could be done via colorizing..  

Please, no.

> > The cornerstone of the Setext/StructuredText-like approach is that
> > the raw text should be as readable as possible, even to the
> > uninitiated.
> 
> I don't see how directives win here.
>
> If anything, it seems like they
> will make it harder to read by the uninitiated, given the power of
> directives to use almost arbitrary syntax..

You seem to think that typing '.. some-directive::' will magically make
something happen. Not so. You'd have to first *implement* the directive, not
a trivial task.

I was referring to '@' and (especially) 'X<>', about the readability
cornerstone. OTOH, directives are readable by way of being explicit. If we
want a digibloofer construct, we say '.. digibloofer::' (having paid the
price for such impertinence by implementing the digibloofer-parsing
extension first, of course ;-).

> However, the idea that "raw text should be as readable as possible,
> even to the uninitiated" is a *goal* of mine, but not a cornerstone.
> Perhaps a cornerstone would be::
> 
>   Raw text should be readable, even by the uninitiated.

I don't see the distinction.

> There are a lot of conflicting goals in designing a markup language,
> and making it as readable as possible is by no means my most
> fundamental goal.

I'd say, for the Setext/StructuredText approach, it *is* the most
fundamental goal. If it's not yours, you'll save yourself a lot of grief by
using XML or TeX.

> In the case of colorizing, I believe that
> colorizing should *never* be necessary to the understanding of a
> docstring.. i.e., you should be able to strip away all colorizing, and
> still understand what it says.

In the Setext/StructuredText approach, you shouldn't *have* to strip away
anything. It should just be obvious, or at least unobtrusive.

> I guess that perhaps what it comes down to is that I am *not*
> necessarily trying to design a Setext/StructuredText-like language.

Aha! :-)

> I'm trying to design a markup language that is optimal for writing
> Python docstrings.  

A noble goal. Please use a different name for what you're doing and let's be
done with it. Lots of room for competition (the field's wide open right now!
;-). The more the merrier.

> In my mind, the only advantage of using
> `quotes` over C{curly braces} is that quotes are easier to ignore..

Precisely. Also, `quotes` have the connotation of, well, quoting.

... And a vigorous debate was had by all. Me and Edward, anyway. Thank you,
sir.

/DG