[Doc-SIG] substitutions or tagging references or inline directives (was RE: Alternative inline markup)

Fri, 09 Nov 2001 22:25:24 -0500

[I'm splitting up the discussions because they really are independent
issues. Keeping up one thread is confusing. ("My brain hurts!") Please
keep the threads separated, or my brain will explode. Thanks.

("Oh my God he's burst his brain!")

This post contains replies to a bunch of other posts. I've tried to
pull them into some semblance of coherence, but there may be
redundancy, ramblings, and misquotes. Apologies in advance.]

[Paul]
> 1. I'm not keen on the overloading of the \` character in the
>    syntax. Substitutions look like interpreted text.

The slashes don't stand out enough? Perhaps a more obtrusive markup is
in order? ;-) Seriously though, suggestions for alternative syntax are
welcome.

(Note: You don't actually have to escape the backquote in these and
other cases: ` "`" (`). Try it with the parser!)

> 2. I'm not 100% clear on the semantics, either. To my mind, pure
>    text substitutions are a bad thing - put the text inline instead.
>    Otherwise, you are making the marked up text *less* readable.
>    `/swim/`?

> .. /swim/ See what I mean

Yes. They're easily abused. Perhaps any use is abuse. We could remove
the replacement text aspect and leave the directive only. That would
mean that an extra level of indirection plus a directive is needed for
the abuse::

    Do you `/swim/`?

    .. /swim/ text:: See what I mean

This type of abuse cannot be prevented, but it would discourage
potential abusers.

Yes, you've convinced me to remove "replcement text" from the spec.
That makes substitutions easier to understand as well: just one case.

>    And the semantics of anything else is effectively
>    output-processor dependent.

I don't see how. As it stands, given this input::

    `/Peace/`, `/love/`, and `/Tux/`. (It's an IBM ad. Tux is the
    Linux penguin.)

    .. /peace/ image:: peace.png
    .. /love/ image:: heart.png
    .. /tux/ image:: tux.png

The parser will turn it into::

    <document>
        <paragraph>
            <substitution_reference refname="peace">
                Peace
            , 
            <substitution_reference refname="love">
                love
            , and 
            <substitution_reference refname="tux">
                Tux
            . (It's an IBM ad. Tux is the
            Linux penguin.)
        <substitution name="peace">
            <image uri="peace.png">
        <substitution name="love">
            <image uri="heart.png">
        <substitution name="tux">
            <image uri="tux.png">

After transformation, this will turn into::

    <document>
        <paragraph>
            <image uri="peace.png">
            , 
            <image uri="heart.png">
            , and 
            <image uri="penguin.png">
            . (It's an IBM ad.)

This is still output-processor-independent. User-defined directives
may well be output-processor-dependent though; it can't be avoided.

>    Even your image example (the least contentious possibility for
>    this sort of thing) may not be renderable in certain output
>    formats (ASCII text, for instance!!!).

That's not a valid argument. Converting to ASCII text almost
invariably loses information. Almost *none* of the reStructuredText
constructs are directly representable in ASCII text.

90% of web content depends on graphics.

Somebody could insert one of those bitmap-to-ASCII-matrix converters,
resulting in huge ASCII graphics displays that you have to stand back
from and squint to see.

> 3. Overall, I'd like to see clearer examples of what all this could
>    be *for*. ... I'd like to see a "motivation" section, with more
>    of this sort of example.

So would I! As I get around to it, I will add them. Documentation
patches are welcome!

> .. [1] Actually, having used the syntax in my example above,
>        I find I quite like it. But point (2) still stands - it's not
>        clear (to me) that the construct is *useful*.

I think substitutions will be useful at times, but I haven't had a
real need for them yet. Now that the syntax is there, I may find
occasion to use it.

Most importantly, substitutions tie up the last loose end that would
prevent the reStructuredText spec from being "complete". There are
sets of structural and body elements, with directives to extend them
as required. There's a set of inline elements, now with substitutions
to extend *them* too.

> > I've asked this before, and I really would like to know: what does
> > the "lj" tag *do* in the end? Can you show us some HTML output?
> 
> That gets back to my point - what's the motivation? And for me, I'd
> like to see more than just HTML output. What would such a document
> show in PDF/PostScript intended for printing? (If the answer is "you
> don't use it in that context", then we're getting too
> domain-specific).

If the "lj" tag is meant for an internal Wiki application, then the
lack of printed-output support is not really an issue. I suggested
HTML because I wanted to understand the problem better.

> And the converse is, that *any* request for extra features should be
> addressable with "use a (substitution|directive)". If not, then we
> need to rethink these constructs, to understand why they aren't
> doing their job properly.

Yes.

> Hmm, by my own argument, that implies that roles should be
> *replaced* by substitutions. Maybe the fact that they can't means
> that there is still something to address here. Substitutions don't
> take "parameters" (the interpreted text part of a role). Roles don't
> have the "supplemental information" (the directive-like bit in the
> ``..`` section - I don't know what to call it) that substitutions
> do.

See my post "Clarification: interpreted text vs. directives vs.
substitutions".

[Alan]
> Instead of ``:attribute:`Fred``` you write ```attribute:: Fred```.
> Directives are slightly expanded to subsume the role of roles. And
> if you ever want to give that "class" directive/role some arguments
> or otherwise make it more complex, you can do that, which you
> couldn't with the current roles.

See my post "Clarification: interpreted text vs. directives vs.
substitutions".

[Alan]
> > > 2) An underscore suffix currently modifies the preceding text by
> > >    making it a link. This notion is extended - the suffix
> > >    indicates that the text is to be tagged in some way,
> > >    indicated by a directive or destination URL in the target::
> > > 
> > >       I had lunch with Jonathan_ today.  We talked about Zope_.
> > > 
> > >       .. _Jonathan: lj [user=jhl]
> > >       .. _Zope: http://www.zope.org/

[David]
> > Interesting idea, putting arbitrary constructs in the link target.
> > However, for consistency that depends on two things:
> > 
> > 1. The link text remains behind, untouched except for being
> >    "activated" in some way.
> > 2. There must *be* a link target. Corollary: the reference must
> >    *be* a reference.

[Alan]
> I agree with (2) but not (1).

That's the opposite of what I first expected to hear. Just to confirm:
what I meant was, given this input::

    this is a `link to something`_

The output would contain the (possibly formatted but otherwise
*unaltered*) text::

    this is a link to something

For hyperlink references, I absolutely require (1). I'd be willing to
entertain an extension such that (2) is no longer true (i.e., it
depends on the processed contents of the "target), but not the other
way around.

> Here's the principle I'm going on: A reStructuredText-to-plaintext
> converter should modify the non-directive parts of the document as
> little as possible. The marked-up text should "read" like
> non-marked-up text.

I don't see how this follows from the above.

> > What will "Jonathan" become?
> 
> ``<tag name="lj" args="[user=jhl]">Jonathan</tag>`` or some such.
> After that, it's an output format issue.
> 
> For the given application I would expect the default text output to
> be ``Jonathan`` and the default HTML output to be::
> 
>     <a href="/userinfo/jhl"><img src="/icons/jhl"></a>
>     <a href="/users/jhl"><b>Jonathan</b></a>

So they *are* hyperlinks, in this case at least.

> > For example, say I want Jonathan's user
> > icon to appear in my paragraph::
> > 
> >     I had lunch with [Jonathan's icon here] today.
> > 
> > How do I do this *without* having a hyperlink at the same time?
> 
> The way you'd write this paragraph in plaintext is::
> 
>     I had lunch with Jonathan today.
> 
> This implies that the reStructuredText paragraph should be::
> 
>     I had lunch with Jonathan_ today.
> 
> Then follow it with::
> 
>     .. _Jonathan: lj-icon jhl
> 
> or the like. If you're really referring to the icon itself, rather
> than referring to Jonathan but using his icon in graphical output,
> then you'd say something like::
> 
>     Jonathan has a goofy icon: `Jonathan's icon`__
> 
>     __ lj-icon jhl

It still looks like a hyperlink to me. My objection remains: if we
change the meaning of "_", then when looking at a *reference*, I can't
tell if it's a hyperlink or not.

(BTW, what does "lj" stand for?)

> > On the other hand, we could say that the trailing-underscore
> > syntax doesn't signify a hyperlink reference, but only indicates a
> > "tagging reference".
> 
> Yes.

OK, I think I understand your proposal now. I've updated "Inline
Substitutions" in http://structuredtext.sf.net/spec/alternatives.txt
to reflect this understanding. Please let me know if I'm mistaken.

> > A tagging reference becomes a hyperlink reference if the contents
> > of the "tag" resolve to a hyperlink.
> 
> Or, rather, a hyperlink *is* a type of tag, and ``__
> http://python.org`` is just sugar for ``__ link http://python.org``.
> We're not adding a construct. We're replacing a construct with a
> more general one.

We'd be replacing a well-understood, specific construct, hyperlinks,
for which the syntax heuristics and mnemonics are well-understood:

- "_" denotes a reference when it's a suffix, and a dereference when
  it's a prefix. Think of "_" as a right-pointing arrow.

- "_" was chosen from Setext partly because it goes well with what we
  see in web pages. Typically, a hyperlink is underlined.

The problem I'm having with this is that we're redefining the syntax
for a relatively straightforward concept, hyperlinks, to a much more
complex one. Hyperlink refs become a special case of "tagged
references". Whether ``this_`` would become a hyperlink reference, an
icon, or something else entirely, is only apparent from examination of
the target: ``.. _this:``. We could say that ``this_`` is "*almost
always* a hyperlink reference (except when the target happens to
contain a directive)", but that's too vague; it's just a thin veil.

When I say ``this_``, I'm making a reference whose text contains
"this" (the text "this" would be underlined in the HTML). If ``this_``
were a "tagged reference" instead, the text "this" might be replaced
by some other text entirely, or a graphic, or nothing at all. If
hyperlink references are just one special case of tagged references, I
don't like the path that the text "this" takes:

- "this" is ripped out of the surrounding text.

- Later in the processing (a post-parse transformation), something
  else (the product of the "tagging target" or "substitution") is put
  in its place.

- Elsewhere, there's a ``.. _this: http://some.uri`` target which is
  really a shortcut for ``.. _this: link_target:: http://some.uri``.
  This is processed into a <reference> which is stored, waiting for
  the aforementioned transformation to pick it up.

One problem is, the "this" in the reference may not match the "this"
in the target. The reference may be ``This_`` or ``THIS_``, while the
target is ``.. _this: ...`` or ``.. _tHiS: ...``. The <reference> will
be constructed at parse time, so its contents will get the *target's*
text. Anonymous targets wouldn't know what text to use.

OK, so to solve this problem, when we do the substitution
transformation, we take the text from the ``This_`` reference and plug
it in to the <reference> elements.

This seems far too complicated and roundabout. Hyperlink references
are common enough to require their own unique syntax and direct
conceptual resolution.

> > >    Link targets which are also legal directive names must be
> > >    enclosed in backquotes.
> > 
> > The frequency of link targets would far outweigh directives, so
> > markup would suffer from extra syntax on targets.
> 
> Anything with a slash or an at-sign doesn't need to be escaped.
> This is the vast majority of cases.

That's not acceptable. However, changing the target syntax as I showed
earlier removes this restriction::

    .. _name: directive:: data

So it's not really an issue.

I already added one special case when we added the indirect target
syntax, ``.. _target: reference_``. Simple URI's cannot end with an
underscore unless it's escaped. The value of indirect targets
outweighed the cost of the special case. In this case however, the
cost is too high, so the extra syntax ("::") to *avoid* the cost is
worthwhile.

[Alan]
> > > 5) Roles can go away.  We don't need them.  Optionally if we
> > >    want the ability to put short directive names inline, we
> > >    could declare ::
> > > 
> > >       `foo:: bar bar bar`

[David]
> > Similar syntax has already been considered and rejected. See
> > http://structuredtext.sf.net/spec/alternatives.txt, "Interpreted
> > Text 'Roles'" alternative 1.

[Alan]
> Alternative 1 is more ambiguous than what I'm suggesting, and does
> not have the benefit of consistency with out-of-line directives.
> 
> However, which syntax to use for simple inline directives is a minor
> side issue, and I shouldn't have combined it with this proposal.
> More important is whether "roles" and "directives" and "tags" should
> all be unified. I think they should. It adds both simplicity and
> power.

I don't think they should. Inline text is another form of inline
markup, the role usually inferred from context. Explicit "roles" are
like adjectives: strong, literal, red, cold. (Not all role names are
adjectives though.) Roles describe their text but don't alter it;
they're for extending the built-in set of descriptive inline tags.
"Directives" are a constructive extension mechanism. They're for
arbitrary structures where there is no built-in syntax. I understand
your concept of "tags" to be the inline equivalent of directives. But
the "tagged" text *is* inline, and we're striving for readability, so
we don't want all the details to be inline. "Tags" could be called
"indirect inline directive references".

So "roles" and "tags" are not compatible; they can't be unified.
"Tags" and "directives" *are* compatible though. "Tags" are just
inline references to directives.

> I realize that what I'm suggesting sounds more complicated. However,
> that's largely because I"m explaining it relative to the current
> spec, not from scratch. I'm keeping in mind "does this make it
> easier or more difficult to explain how to use the language in under
> two minutes" (or under 40 lines of email) because I'm going to have
> to do just that. I don't think I'm compromising that goal; if
> anything what I'm suggesting brings us *closer*.

I find the overloading of what is now "hyperlink reference" syntax is
very confusing and adds a lot of conceptual complexity. Heck, it's
taken this long for *me* to "get it", to understand what you're
proposing.

> I'm writing a Zope product akin to the existing STXDocument to use
> rST in Zope without explicitly invoking converters. I'm planning to
> train people on it and start having them use it by next *week*.
> (Freeze or no freeze - if the spec changes on me after that, well,
> I'll suffer.)

That's great. Feedback from actual use will be much appreciated.

> One cool thing about more powerful inline tagging is that we don't
> have to add any more special characters. If we want to add
> substitution, or inline images, or the Spanish Inquisition, we can
> do that with the tag syntax rather than more punctuation.

At the expense of shoehorning inline tagging onto the same syntax as
hyperlink references.

[Paul]
> Consistency is high on my list of priorities. I don't really like
> David's ```/subst/``` syntax, either, because it loses the "anything
> in \` characters is interpreted text" rule.

The ```/subst/``` syntax is just the best I could come up with. If I
hear a better suggestion, I'll take it.

[Paul]
> > Are you defining a construct which starts with ```{`` and ends
> > with ``}`__``?

[Alan]
> Ack, no!  I'm saying that in the existing construct ::
> 
>     `content content content`__
> 
> curly braces in the content would have to be escaped. So if the
> content was ``{'a':1, 'b':2}`` you'd have to write ``\{'a':1,
> 'b':2\}`` instead.

To reiterate, I can't see ever accepting this embedded curly brace
syntax, so you might as well drop it. It's confusing the main issue
anyhow.

[Alan]
> Well, it's become clear that I need to provide much stronger
> arguments for why richer inline markup is important. I feel pretty
> strongly that this is something that will come to bite us later, and
> not very much later, if we want rST to thrive as a general markup.

reStrcturedText will never be a general-purpose markup like XML or
TeX. It's a limited, what-you-see-is-pretty-close-to-what-you-get
markup syntax, for converting plain text to structured formats. As
such, we must accept certain limitations. Interpreted text,
directives, and substitutions all extend the basic set of constructs
so that any "almost there" structure can be represented.

> I'll try to send out an edited and clearer proposal with a wider
> variety of examples within a day or two.

Looking forward to it.

-- 
David Goodger    goodger@users.sourceforge.net    Open-source projects:
 - Python Docstring Processing System: http://docstring.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net
 - The Go Tools Project: http://gotools.sourceforge.net