[Doc-SIG] Re: reStructuredText Markup Specification

Wolfgang Lipp castor@snafu.de
Wed, 6 Jun 2001 09:46:27 MET


Intro

        On Sun, 03 Jun 2001 10:34:24 -0400, 
        David Goodger wrote:
        > For section structure, indentation 
        > is unnatural and awkward.

    First of all, I would like to say that I find David's
    proposal overwhelming for both its comprehensiveness and
    its length. Pretty much everything that might be covered
    is indeed mentioned, up to the coverage of DOM.

    However, I would like to object strongly to the advertised
    abolition of indention to indicate document structure.

    In this posting, I would like to motivate with a hopefully
    appropriate example and a following in-detail discussion
    that all of David's fears concerning structure-by-
    indention, with one small caveat, are not substantial in
    my view.

Structure-by-Indention A Valid Generalization

    In contradiction to what David says, I find indent[at]ion
    natural and elegant, an appropriate and innovative means
    to express structure both in the application to
    programming language sources and to text markup. Moreover,
    the principle of hierarchical-structure-through-indention
    is applicable to a third domain, namely, to structured,
    'non-binary' data files.

    In other words, there is a chance not only to produce a
    standard for the limited use of docstrings, there is
    definitely a chance to set up a framework in which the
    sources of: (1) the programming language, (2a) its inline
    documentation and (2b) other information materials, as
    well as (3a) configurational files and (3b) databases(*)
    are interpretable.

            (*) as far as 'binary' formats are not deemed to
            be more appropriate for some purposes; but think
            of mid-sized address collections etc., where non-
            binary, user-editable formats are definitely a big
            plus.

    At first, this appears to be a gargantuan task, given the
    volume of the docstring proposals alone. It may also seem
    to be off-topic to mention configuration files when
    documentation is discussed. However, the distinction
    between a structured database, a configuration file and a
    documentation is a rather superficial one: in all cases we
    have sequences of values that are (optionally) given
    names, where values in themselves may be either 'terminal'
    (eg, signify simple numbers or sequences of characters) or
    in turn contain more names with associated values: a name,
    therefore, may refer either to a single value or an
    arbitrarily deeply nested sequence of names and values. In
    a language like Python, the source symbolizes sequences of
    statements and expressions; again, where one entire group
    of statements is dependent on one single statement,
    indention is used to convey the scope of this dependency.


Structure-by-Indention vs. Structure-by-"Style"

    Missing important generalizations at key junctures of a
    developing process may proof to be very expensive in the
    long run. As a case in point, let's have a look at the one
    most popular configuration file format to date, the 'ini'
    format; this consists of unordered list of items, followed
    by equal signs and optionally divided into non-
    hierarchical sections indicated by bracketed names, such
    as:

        [Bolts]
        foo = 42
        bar0 = 3
        bar1 = 4
        [LeversTypeA]
        foo = 84
        outer_x = 10
        outer_y = 12
        inner_x = 8
        inner_y = 8
        [LeversTypeB]
        foo = 92

    It is easy to see that the specification of the 'ini'
    format fails to recognize 'sequences' and 'structures'. It
    is, therefore, very clumsy to express these concepts, when
    the need arises, in this format; commonly, the sketched
    kind of kludges (numbered and compound names) are used as
    workarounds. A revised version along the lines of Python
    syntax and the original StructuredText proposal painlessly
    removes these shortcomings (beta implementation available
    as 'pylon.xcfg'; as an option, one could consider to make
    trailing '=' obligatory at the end of lines that start a
    block):

        Bolts
            foo = 42
            bar        # or, "bar = (3,4)"
                3
                4
        Levers
            TypeA
                foo = 84
                outer
                    x = 10
                    y = 12
                inner
                    x = 8
                    y = 8
            TypeB
                foo = 92

    A structure like this patterns quite naturally to Python's
    mappings and lists. It is also easy to see that the
    structure thus indicated is in principle neither different
    from a Python script, nor from a text with paragraphs and
    headings.

    Now, it is very simple to copy, for example, the section
    'Levers.TypeA.outer' to another place, even to another
    structural level, say 'Bolts'. You will then have to
    change indention; in case you forget that, either
    something ungrammatical or something with a different
    meaning will result. However, such a mistake would in this
    case be only of local impact (ie., the place where the
    problem is recognized is the place where the problem
    actually occurs; also, one can delineate the offending
    construct and could choose to skip in in processing):

        Bolts
            foo = 42
            bar = (3,4)
                outer           # ungrammatical indention
                    x = 10
                    y = 12
        [...]

    (Of course, the ungrammaticality stems from certain
    assumptions made; a syntax where the above construct would
    yield, for example, '{'bar':{ 'NN':(3,4), 'outer':{...} },
    'NN' being a default name, is also conceivable).
    
    However, in David's proposal, the same structure would
    look something like this (please correct me):


        ======
        Bolts
        ======

        foo = 42
        bar = 3,4 # or something like this

        ======
        Levers
        ======

        TypeA
        -----

        foo = 84

        outer
        .....
        x = 10
        y = 12

        inner
        .....
        x = 8
        y = 8

        TypeB
        -----

        foo = 92

    
    Now, people, please excuse me! but you do not want me or
    anyone to believe that this is a clearer, more obvious,
    'self-documenting', more maintainable format, do you?
    
    Does anyone honestly think this gets *any* better only
    because documentation has typically *more* material
    between section headings?
    
    Next, consider what happens when you want to copy
    Levers.TypeA.outer to Bolts:


        ======
        Bolts
        ======
        [...]   

        outer     -+
        -----      | <---+    
        x = 10     |     |
        y = 12    -+     |
                         |
                         |
        ======           |
        Levers           |
        ======           |
        [...]            |
                         |
        outer     -+     |
        .....      +-----+
        x = 10     |     
        y = 12    -+     

    
    To me, it is completely non-obvious that the markup of
    element 'outer' has to change from one arbitrary style
    (dotted) to another arbitrary style (dashed). Nothing in
    these styles indicates their hierarchical meaning. Also,
    in case I forget to change that markup, I do not get an
    error anywhere near the offending line -- since markups ar
    determined ad-hoc by precedent, it may in fact be the
    *following* section that looks ungrammatical. This kind of
    markup is difficult to understand and hard to maintain.


Refutation

    I would now like to detail my objections against the
    reasons David put forth to show the unfitness of indention
    as a means to indicate structure in natural-language
    texts. Quoting David's posting (see the end of this
    message for the original), these reasons are:

        (1) Using indentation is [u]nnatural 

            (1a) Most published works use title style (type
            size, face, weight, and position) and/or
            section/subsection numbering rather than
            indentation to indicate hierarchy. When
            indentation is used, it is usually the formatted
            end-result and is there for aesthetic rather than
            structural purposes.

            (1b) [T]he style of the section title should
            indicate its structure. [...] In fact, [section
            structure through title style] is already in
            widespread use in plain text documents, including
            in Python's standard distribution (such as the
            toplevel README_ file).

        (2) Using indentation is [a]wkward.

            (2a) One must think about the formatting as the
            text is keyed in.

            (2b) And when structural changes are made (it is
            very common during the composition of a document
            to rearrange sections and their hierarchy) we must
            use block-indent and -unindent functions.
            
            (2c) In order to edit documents using indentation,
            relatively advanced text editors must be used.

        (3) Applying indentation to ordinary written text is
        hypergeneralization.

    Following are my objections.
    
    1:  Indention Unnatural?
    
        Indention is 'natural' -- that's why C programmers use
        it although they don't have it, that's why GvR chose
        indention for Python although he didn't have to.
        Indention is a typographical device that came into
        widespread use at least with the spread of the
        typewriter, and what are plain text editors but
        software typewriters?
        
        It is a near-no-no *not* to use indention in *some*
        places, even in places where other typographical means
        of expression *are* available (verse, quotes, mottos,
        abstracts...), or where it is redundant (unindented
        code is almost tantamount to obfuscated code).
        
        It is interesting to see that David's proposal *does*
        keep indention for these cases:

            * lists
            * term definitions
            * literal blocks
            ... more?

        Therefore, indention is indeed highly 'natural' and
        also 'appropriate'. I try to show this in the
        discussion of almost every single question in this
        posting. True, it may be painful in cases where you
        must use an editor that is weak on this point. Above,
        I even wrote a list of two integers in indented style;
        most of the time a list notation would be clearer
        (short items: parentheses -- long items: indention).
        For this and other reasons, it may be worthwhile to
        consider a syntax that allows both indention and other
        means (such as parentheses) to indicate document
        structure, so both are always available options.
        (Python, of course, uses parentheses for mappings,
        lists, and tuples, but indentions for classes,
        functions, etc.; here, both means are used, but not
        interchangeably so).


    1a: Published Works Don't Do It, Let's Do It
        
        As stated above, professionally typeset and published
        materials do in some cases use indention, but mostly
        not for the use of indicating section structure.
        However, 'real' typography, as David notes, too, has
        means that are simply not available in a 'typewriter'
        situation. And, when you open a book, you will see
        that it is, in fact, not only the size and position of
        titles that matter -- it is also the space given to
        them: Chapters often commence only on uneven pages,
        leaving an entirely blank page to their left, sections
        have a considerable amount of space above and some
        more below them. Granted, we also meet with the
        occasional embellishment here and there, which may
        take on the shape of a line.
        
        However, David's argument in itself is a bit of a
        problem because it is claimed that we users of editors
        should mimick them typographers in their ways (and
        shun indention, because, you see, in books they don't
        use it either), while it is at the same time
        acknowledged that we lack the means to do so (we don't
        have big type, so let's use underline). This is
        contradictory.
        
        Secondly, it is stated that indention "is usually the
        formatted end-result and is there for aesthetic rather
        than structural purposes" -- well, it seems to me that
        David's underlined section headings are rather
        'aesthetically' than 'structurally' motivated, at
        least when compared to indention.
        
        Also, it is a little bit of a folly to throw in the
        concept of 'aesthetic' at this point of the discussion
        and expect people to understand this to be a kind of a
        bug, a failure, a misunderstanding, a faulty approach
        to consider 'aesthetic' aspects when what you really
        wanted was clarifying 'structure'. If those books with
        their typesizes and empty spaces were not 'aesthetic',
        who would read them? If our design of typeset text
        were not 'aesthetic', could we even manage to write
        them? If not about the 'aesthetics', what else is it
        that we talk about here? If aesthetics are 'out', I
        guess then parentheses are 'in': indention is not the
        easiest to parse, so why bother? 
        
        It is precisely 'aesthetics' and almost nothing but
        'aesthetics' that we talk about here. This is a 99%
        pure 'aesthetics' discussion, the rest being
        feasibility (in Python code). It is not 'practical',
        not from the computer's point of view, to have
        scripting languages -- that's a mere fuzz, a waste of
        CPU cycles. No computer 'writes' documentation, no
        computer cares about the look of documents, drafted or
        printed. It is us who is doing this, and what we are
        looking for is a practical, manageable, readable and
        pleasing way of doing the job, in other words, we are
        looking for a beautiful solution of the problem.
        
        However, the proposed solution for over- and
        underlined section titles, while it may have some
        visual appeal (that a line put into a comment would
        also lend), misses to fulfill the promise that is a
        practical means (demonstrated when we tried copying
        elements in the last section). We surprisingly also
        lack the technical means to conveniently manage over-
        and underlining of headlines, as I try to show in
        point (2c), below.
                
        So, typographers don't do it (well, not all of the
        time), but they have other means. We are programmers
        and documentation authors, working on software
        typewriters, so let's do it.
        

    2:  Indention Awkward?
        
        Indention is elegant. Trying to convince Python people
        of the elegance of indention is unnecessary, they're
        already convinced of this. It should be hard for a
        programmer to accept a scheme that is purportedly
        'natural' in its indication of 'structure' when it
        uses arbitrary, highly context-sensitive and ambiguous
        lines instead of (any or all of) indention,
        parentheses, begin-end-commands, i.e. those means that
        are, for a programmer, the most logical choices.
        

    2a: Think While You Type?
    
        David, no. You are with your text when you write it,
        are you? And what, please, is the application of a
        proper (and arbitrary) line style but an activity that
        necessitates a certain amount of awareness? I reject
        this point.


    2b: Structural Changes Difficult?
    
        The main points to be made here have already been
        discussed in the previous section, entitled
        'Structure-by-Indention vs. Structure-by-"Style"'. Let
        me add here that I consider it as one of the vices --
        or omittances - of HTML and most markup schemes that
        authors are forced to indicate all section levels
        explicitly. As David says, the writing of a document
        is a process where many changes even in the structure
        of a document are made. But how often does the author,
        who for some reason tackel their source with a plain
        editor, has to go through all those tags, exchanging
        all the numbers in all the tags, twice for every
        heading, upon finding an unsatisfactory structuring of
        the document! Of how little help all the advanced
        regular expression replacement tools are in this case!
        How big the surprise on finding out that with the new-
        fangled docstring format they will find themselves at
        very much the same impasse again! How much would he
        love to even use M$W*rd, if only for the outline view!
        Outline view with symbolic indention! Isn't this
        cornerstone software of the evil WYZYWYG empire one of
        the most unlikely places in the universe to find
        concrete indention being replaced by abstract,
        symbolic indention? But it works, and it's easy:
        change structure, no problem, drag, drop, all formats
        cared for.
        
        Indention is the single trick that allows users with
        plain text editors to prove they're no dummies when it
        comes to restructuring. David complains about having
        to change the indention all the time. But this is a
        feature, not a bug. It is well intended that the level
        of the section is *not* written down. It is done due
        to the insight that a subsection is not different from
        a subsubsection: the latter only happens to be at a
        structurally deeper level than thee former.
        Accordingly, in indented text, what you do in order to
        *move* the level is you *move* the text. So much for
        the 'Indention Is Unnatural' argument.
        
        If someone thinks one must have concrete, absolute
        section levels, and there may be situations where they
        are advantageous, please make a proposal that shows
        the user section levels and not twiggly vs. dotted vs.
        dashed single and double lines. I suggested elsewhere
        to introduce proper commands (I find the 'directives'
        of the proposal wholly unsatisfactory, especially in
        the context of a scripting language), and I used
        double semicolons for the demonstration. Therefore,
        one could use
        
            ;;h My Title
                The body of the section 
                goes into subsequent blocks.

                ;;h Another Title
                    The body of the subsection 
                    goes into subsequent blocks.

        for relative heading-body pairs and 

            ;;h3 My Title
                The body of the section 
                goes into subsequent blocks.

            ;;h+1 Another Title
                The body of the subsection 
                goes into subsequent blocks.

        for absolute and relative, explicit markups.
        Additionally, again as stated elsewhere, I think it is
        advantageous and more systematical to introduce
        explicit, if somewhat lengthy, extensible, self-
        explaining commands and only then associate these, as
        far as there is need and mutual agreement, with
        typographic situations ("Single line, no punctuation
        at end, followed be indented blocks" and so on). This
        kind of procedure gives authors much more orientation
        and feature-safety.
    
    
    2c: Indention-Capable Software Not Available?
    
        In point (2c), David says that one "must use block-
        indent and -unindent functions[, features of]
        relatively advanced text editors". Well, at least we
        *do* have *some* editors that have functionality to
        perform the indenting and undenting of groups of lines
        -- can anyone name a text editor that has a similar
        functionality to perform underlining? Can anyone,
        please, point out a text editor that does all of
        these:
        
            * do *both* over- *and* underlining,
            
            * keep track of the characters in over- and
            underlining being the same,
            
            * keep both over- and underline at the same
            length,
            
            * keep both over- and underline at least as long
            as the right edge of the intervening title.
        
        I do not know any editor with any of these
        capabilities (or 'awarenesses'). Sure, you can write
        an Emacs macro to do that, but then, Emacs is exactly
        that kind of "relatively advanced" software that
        David does not want to be forced to use (nor do I).

        Moreover, if indention is only available in
        "relatively advanced text editors", as David observes,
        then, please! where is the editor, apart from Emacs,
        that supports the proposed table format? I only know a
        very few that support a 'line drawing mode' (ie,
        moving the cursor leaves a line as trace; linestyles,
        intersections etc.), but that is a very far cry from
        being able to draw (or manage) *tables*.

        I for sure am one who, as a reader, would definitely
        enjoy more readable tabular data in plain text. As an
        author, however, I am loathe to find myself being
        obliged to use Emacs (and I know the program) only
        because that's the only software in the world that
        knows how to decently handle ASCII tables (as an
        *optional* format I can, of course, only welcome the
        proposal for tables).

        It is not quite clear to me how to sell this: First,
        the well-established device of indention is more or
        less (but not entirely) thrown out, partly on the
        grounds that current text editors are purportedly not
        able to handle it (or perhaps make it difficult to use
        "block- indent and -unindent functions" -- I use the
        tab key for that purpose). Then, a format for section
        headings is suggested that current editing software is
        plainly ignorant about. Next, a table format is
        introduced that in 99% of all editors turns out plain
        hell as soon as a single cell has to change size (try
        it once). This argumentation fails to convince me.


    (3) Indention A Valid Generalization
        
        I contend that indention is a valid generalization to
        indicated structure of a given text. It is precisely
        the generalization expressable by indention that is
        missing in, for example, HTML: In HTML, you put one
        heading into the text, then a paragraph, then another
        heading, again followed by a paragraph. While this in
        itself suffices to indicate the structure, it is not
        quite obvious why, in another case, both list items,
        which are members (dependants) of a list, and the list
        itself are made structurally explicit. Clearly, this
        difference in treatment is unjustified, although
        practical reasons may be found. In theory, an HTML
        markup should (and, syntactically, could) look
        something like this:

            <section>
                <title>My Title</title>
                <body>
                    <p>The body of the section 
                    goes into subsequent blocks.</p>
                    <section>
                        <title>Another Title</title>
                        <body>
                            <p>The body of the subsection 
                            goes into subsequent blocks.</p>
                        </body>
                    </section>
                </body>
            <section>

        Of course, this is sort of a markup-overkill for the
        weathered indentionist.
                    
        The drawback of the HTML view is simply that the
        structure of the markup is not as congruent with the
        conceptual structure of a  document as would be
        possible.
        
        Conceptually, a chapter 'has' a 'heading' and
        'contains' 'text', which in turn may be divided into
        'sections' and so on.
        
        In HTML, however, a 'heading' has a 'level' and
        'precedes' a 'text', and perhaps another 'heading' of
        another 'level'.
        
        In the first view, it is *level* that follows from the
        structure, while in the second, it is the *structure*
        that must be deduced from the 'levels'. This, indeed,
        is a rather decisive difference.
        
        According to David's proposal, docstrings would suffer
        from the same lack of sound generalization, with all
        difficulties, as HTML documents.
        

One More Remark, A Caveat And Conclusion

    Apart from the treatment of indention in the proposal, I
    also have some doubts abouts the fitness of the proposed
    markups for definition lists (number 8 of the proposal)
    and literal blocks (number 9). In the first case, the
    proposed markup appears somehow too volatile to me, in the
    second, it is quite arbitrary. Again, wouldn't we be
    better off with markup to signify 'commands' or 'role
    indicators'? Then, taking ';;' for the purpose of
    demonstration, we could have

        ;;glossary
            foo
                A subspecies of gnu.

                doo-foo
                    Mythical animal; a winged foo.

            rants
                Wide-eyed geckoes.

        as well as

            ;;lit
                Literally, a line.

    It is, then, still possible to associate more
    unobstrusive, less explicit formatting characteristics
    with these or other data formats; however, that choice
    would be much more configurable and explicit than the
    procedures presently proposed. I think it is more
    promising to make an extensible, explicit scheme and then
    allow shortcuts to those features than to bind some non-
    explicit markup early in the decision process to some very
    specific purpose.

    (BTW, is there a distinction between 'literal' segments
    and 'code' segments? That would be important for coloring
    and formatting).

    Now for the caveat announced above. Yes, the proposal is
    right, those underlines do somehow stick out, I admit
    that. But, isn't that more appropriately effected with a
    line of dashes in a comment within the docstring? 
    
    Since lines of dashes and the like would then be free
    again I suggest that a concrete markup is used for
    horzontal rulers. HRs are very practical in long texts,
    although typographs and web design advisors discourage
    their use (but those people don't deal with long-running,
    single- page technical documentation). And what markup
    could be more suggestive than lines made up of whitespace
    plus nothing but repetitions of any one of these
    characters: '-.,;:_#+*~'(etc.).

    Concluding, I urge everybody not to abolish indention.
    That wouldn't be very Python, I'm afraid.


Wolfgang Lipp
castor@snafu.de
lipp@epost.de


full quote:
    >3. Structure via Indentation
    >============================
    >Setext_ required that body text be indented by 2 spaces. The original
    >StructuredText_ and StructuredTextNG_ require that section structure be
    >indicated through indentation, as "inspired by Python". For certain
    >structures (outlines, lists, literal blocks, block quotes) indentation
    >naturally indicates structure or hierarchy. For section structure,
    >indentation is unnatural and awkward. Rather, the style of the section
    >title should indicate its structure.
    >In the original StructuredText, sections consist of one-line title
    >paragraphs followed by indented paragraphs and other body elements. Using
    >indentation is:
    >- Unnatural. Most published works use title style (type size, face, weight,
    >  and position) and/or section/subsection numbering rather than indentation
    >  to indicate hierarchy. When indentation is used, it is usually the
    >  formatted end-result and is there for aesthetic rather than structural
    >  purposes.
    >- Awkward. One must think about the formatting as the text is keyed in. And
    >  when structural changes are made (it is very common during the
    >  composition of a document to rearrange sections and their hierarchy) we
    >  must use block-indent and -unindent functions. In order to edit documents
    >  using indentation, relatively advanced text editors must be used.
    >Python's significant whitespace is a wonderful innovation (even if not
    >original to Python), however applying indentation to ordinary written text
    >is hypergeneralization.
    >reStructuredText_ indicates section structure through title style (as
    >exemplified by this document). This is far more natural. In fact, it is
    >already in widespread use in plain text documents, including in Python's
    >standard distribution (such as the toplevel README_ file).