[Doc-SIG] Re: Grump about field lists

David Goodger goodger@users.sourceforge.net
Mon, 24 Sep 2001 23:01:33 -0400


Tony J Ibbs (Tibs) wrote:
> However, some field names are treated specially::
> 
>     :author: Me
>     :Date: 29th February 200
> 
> gives::
> 
>     <author> David Goodger
>     <date> 29th February 200

Please note, there is *no* automatic conversion of "Me" into "David
Goodger"! !!! (I'm sure it was a typo. But could you imagine the
arrogance of a programmer adding such a "feature" to his code?)

> Note that I realise that this special casing is only done in the
> block at the *start* of the document (foreshadow_)...

That's important to the discussion.

> I believe that this special treatment is (a) because some fields are
> felt to be "special" and should thus be easy to extract from the
> tree, and (b) because when what evolved into field lists was being
> proposed on Doc-SIG, this was the sort of thing that was expected to
> happen for *all* field names (I gloss over history greatly, of
> course).

It's not that some fields are "special". "Bibliographic Field List
Context" is just a way of getting some extra functionality out of the
limited syntax.

I felt that the bibliographic elements (author, date, version, etc.)
were useful to a generic document, so added them to the DTD.
Naturally, I wanted to provide some mechanism to the reStructuredText
authors to include those elements in their doucments. The past Doc-SIG
discussions and PEP syntax suggested that field lists were the way to
go. It's a bit of syntax overloading, but practical.

> However, I now find that I don't like this division of fields into
> "normal" and "special", for two broad reasons.
> 
> Broad reason 1: implementation
> ------------------------------
> When trying to output HTML from a DPS tree, it is useful if I can
> produce a decent result without any pre-processing of the tree.

So you don't want to deal with the bibliographic elements at all?
In order to deal with a richly structured document, you're going to
have to do some pre-processing, tree transformations, and addition
of boilerplate text here and there.

> For "normal" field lists, I have a *list* - that is, a grouping of
> adjacent field list items - which facilitates treating them *as* a
> group. For the "special" fields, there is no such linkage.

The bibliographic elements do not constitute a list. They do
constitute a group of elements; we could group them together in a
'docinfo' (or 'bibliographic' or 'metadata') element if that would
help. (But then we face the question: what goes inside 'docinfo', and
what doesn't? I think most people would agree 'title' is too generic
to go inside 'docinfo'. What about 'subtitle'? See the Docbook DTD for
a cautionary example: its 'bookinfo' etc. elements contain 'title', as
does the 'book' itself.)

I envisage these bibliographic elements being laid out in various
ways: like a title page of a book, like the first page of the Python
Library Reference, etc. I do *not* see them being laid out as a field
list, at least not exclusively, and not for anything but experimental
output (e.g., verbatim output of the raw input for verification
purposes).

For a typical bibliographic field list::

    Sand-Nymphs of Mars
    ===================

    :Author: Kilgore Trout
    :Contact: trout@fictional.org
    :Date: 24 September, 2001

I see this being laid out something like this::

                         SAND-NYMPHS OF MARS


                            Kilgore Trout
                        (trout@fictional.org)
                          24 September, 2001

> This could, of course, be solved to *some* extent by insisting that
> these field list items *also* become children of a <field_list> node

I don't think this is a good idea. 'field_list' is just used as a
syntax vehicle; it's not the end result.

> (Hmm - and surely there *is* a Good Thing about being able to point to
> the Bibliographic Data *as* a subtree of the document.

Why?

If you just need an element around them for ease of processing, a
'docinfo' would be easy to add.

> Broad reason 2: theory
> ----------------------
> Having had my mind drawn to this, I think I have some general
> objections to the idea, anyway.
> 
> The first is simply that I don't think it is necessary. I think that
> it should be possible to handle any action that can be done with an
> ``<author>`` tag as easily with a ``<field>`` that has the correct
> subtree.

Yes, by identifying the field name and doing a tree transformation.
That's what's being done.

> And if it isn't, then transfer the field name to be an attribute

Same thing.

> The second is that I'm a bit unhappy with the ad-hoc nature of adding
> names - let us say that I have a document with a field name "History"
> (not unreasonable). What happens if reST introduces this as a standard
> tag at some time - suddenly, my customised Writer code will get *very*
> confused at the new parsing.

That's a backwards incompatibility issue, same as what happens in
Python when a new keyword is added (like 'yield').

Say we do add a new bibliographic field at some point in the future.
Some code would break. Since we're following good XP practise, we'd
write a unit test and see the problem right away. At this point the
TibsWriter software is part of the DPS/docutils package in Python's
standard library. If we don't fix TibsWriter before checking the
changes in to the Python codebase, the Python regression tests fail
and we get hell from Guido. We fix the problem, either by backing
out the changes or adapting TibsWriter to the changes.

For any 3rd party 'writers' out there not part of the core, they face
the same issues as a Python syntax change. Hopefully there's
sufficient warning. If not, there's a small amount of maintenance to
be done. Hopefully we learn from any backlash that adequate warning
is not optional.

> Speaking of which, I especially don't like the out-reference to
> RCS/CVS keywords. The `RCS keyword recognition` section says that
> any RCS keyword shall be a bibliographic keyword.

No, it doesn't say that. It says "In the context of bibliographic
field lists". Perhaps that could use some explanation (below).

> This is unfriendly to the poor user, because it *requires* them to
> study a different document before they can figure out unused
> keywords

At first I was exasperated, thinking you were being exceedingly
pedantic ;-), but I'll assume that it's just the terseness of that
part of the spec that's to blame. Here's an expanded explanation
(and this is how the parser actually works):

    The RCS keyword processing only kicks in when all of these
    conditions hold:
    
    1. The field list is in bibliographic context (first non-comment
       contstruct in the document, after a document title if there is
       one).
    
    2. The field name is a registered bibliographic field name.
    
    3. The sole contents of the field is an expanded RCS keyword, of
       the form '$Keyword: data $'.

The only people who are going to be putting RCS keywords in their
documents already know about RCS keywords, so there's (almost) no
danger of an accident. I can't see someone putting dollar-signs
around a word by accident. And for an RCS keyword to be expanded,
the file has to be *stored* under RCS or CVS, so if a keyword
accident does happen, the user has a greater problem than we need
address.

> (even if they're using DPS/reST to write a discourse on cats, or
> something inherently non-programming)

Then the .rtxt file wouldn't be stored under RCS/CVS.

> also because it nails us to a moving target over which we have no
> control (we can't stop CVS adding a new keyword to its list,
> unlikely as we may hope that to be).

An acceptable risk, I think.

> Also, in that section, "status" and "date" are named, but these are
> already mentioned in the previous section

Maybe the spec is misleading. '$Date: ... $' is not only processed in
the 'date' field. Any RCS keyword can be processed in any
bibliographic field. I was just using 'status' and 'date' as examples.

If this is too loose, we *could* define that '$Date$' is only
recognized in 'date' fields, '$Revision$' only in 'version' or
'revision' fields, etc. But then we limit the possibilities; I'm sure
there would be a complaint eventually.

> (and do you *really* think you can automatically convert all dates
> into ISO 8601 format?

Yes, I do. The RCS date field is defined as expanding to 'YYYY/MM/DD
hh:mm:ss' (in UTC).

(I just looked up the RCS manpage. It doesn't actually specify the
date expansion format, but from experience I know the above format to
be correct. If it is ever different, the date-specific pattern in the
parser won't match, and only the '$'s and 'Date:' would be removed and
the raw data left behind; no harm done.)

> If we're going to support RCS/CVS keywords directly, can we please
> prefix them with RCS (or CVS) - e.g., "CVSDate" or "cvsdate" - to
> make it clear what we mean?

That would defeat the purpose. We have bibliographic elements, like
'date'. People will use RCS keywords in their source files, because
they're automatically updated. The RCS keyword processing is merely a
cosmetic convenience, tossing the cruft.

> .. _foreshadow:: Hmm, that's another problem with the "special
>    treatment" - it only happens in one place. So I probably have
>    to be able to cope with the same "quantity" represented by
>    either of two means, anyway, if I want to go data-mining.

I don't follow; please explain.

> (Hmm - having said I thought that :title: was unlikely to happen,
> there it is for all to see.

You're referring to your docstring-develop message?

The reason for 'title' being recognized as a bibliographic field name
was for generality, and specifically to support the PEP header syntax
as an alternate reader/"input mode". For the PEP Reader, the
bibliographic elements should be extended to include all of the PEP
header fields. If the redundant inclusion of 'title' in the standard
set of bibliographic fields is painful, it would be easy to remove it
and only put it back for the PEP Reader.

-- 
David Goodger    goodger@users.sourceforge.net    Open-source projects:
 - Python Docstring Processing System: http://docstring.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net
 - The Go Tools Project: http://gotools.sourceforge.net