[Doc-SIG] rST hyperlink syntax

David Goodger goodger@users.sourceforge.net
Thu, 18 Oct 2001 23:04:03 -0400


I've posted "Reworking Explicit Markup", a new section from
alternatives.txt__, separately from this discussion. It contains an
organized analysis of the current and proposed syntaxes.

.. __: http://structuredtext.sourceforge.net/spec/alternatives.txt

[Alan]
> Anchors and named URIs are entirely different - one is a marker of a
> particular position in the document, the other associates a name
> with an external resource and has identical semantics regardless of
> its position in the document - yet they have the same syntax.

Whether they're "entirely different" or quite similar is a matter of
perception. (And terminology! [#]_) Internal hyperlink targets (called
"anchors" above), external targets [#]_ ("named URIs"), and indirect
targets associate a hyperlink name with a destination. The destination
for an internal hyperlink is a position inside the document, while an
external hyperlink's destination is a URI. This difference is
reflected in the link block.

.. [#] Call them internal/external/indirect "hyperlink targets" and
   they seem to have much more in common than calling them by
   completely different names. Let's get our terminology straight.
   Please see "Reworking Explicit Markup", posted separately.

.. [#] Note that I recently changed the name from "indirect hyperlink
   target" to "external hyperlink target", since adding real indirect
   targets, e.g. ``.. _target: reference_``.

An anonymous hyperlink target can be external, internal, or indirect.
I don't see much use for anonymous internal hyperlink targets though.

Another way to think of them is with regards to the hyperlink
reference, the other end of the link. The target of ``hyperlink_`` in
the text may be internal or external, directly or indirectly. There's
no distinction at the reference.

> Comments are freeform, unless they happen to start with an
> underscore, in which case they can't contain a colon on the first
> line.

Stated that way, it sounds complicated, much worse than it actually
is. There's specific syntax for hyperlink targets, footnotes, and
directives. Anything not matching that syntax is a comment.

> All of them start with a comment syntax intended to imply "hidden",
> except that footnotes usually appear in visible output, and link
> URLs sometimes do.

[Tony]
> That's not my reading - I read them as "special" things, not
> "hidden" things.

I agree. The explicit markup start (not "comment syntax"), ".." at the
beginning of a block, indicates "exceptional" or "special". Again, a
matter of perception.

At present, there is a two-level syntax. First-level syntax is for the
common, easily recognizable constructs: the "what you see is what you
get" syntax. The explicit markup syntax is the one exception; it opens
up a second-level syntax, generally more abstract and less
representational, for special constructs. The first question here is,
do we keep two levels of syntax, or do we (at least parially) flatten
them by making new first-level syntax for all the second-level
constructs? If we choose to keep the two levels, the next question is:
can we modify the second-level syntax to make certain constructs
simpler, and how?

[Alan]
> IMHO, the attempt to cram these various constructs into ``.. ``
> seems like a mess, with very little advantage.  If we ditch that
> requirement::
> 
>     __ http://somewhere           anonymous uri
>     __ blah: http://somewhere     named uri
>     __ _blah                      anchor
>     __ [blah] blah blah           footnote
>     .. blah: http://somewhere     comment
>     .. directive:: foo            directive
> 
> Anchors are shorter, no longer look like refuris, and don't
> have the misleading colon which leads one to expect another argument,
> containment of a block, or both.

Again, a matter of perception. I don't consider the colon on internal
targets to be misleading at all. ``.. _a:`` says "the hyperlink 'a'
refers to whatever comes next", be it URI, reference, or a location in
its own document.

> Everything which can be linked to starts with ``__ ``,

Good, except for the inconsistent internal target ("anchor"), which
has an extra underscore.

> There's a potential ambiguity between anchors and anonymous URIs
> starting with an underscore or square bracket. The URI would have to
> be escaped in that case; no big deal.

It's another exception.

> There's potential for human confusion between anonymous and named
> URIs

I wouldn't worry about that. The presence or absence of a space is
immediately visible and significant to human eyes.

> I'd actually prefer a still different syntax, but it's further out
> in left field, since it requires changing some connotations. ::
> 
>     .. http://somewhere           anonymous uri
>     .. blah: http://somewhere     named uri
>     .. _blah                      anchor
>     .. [blah] blah blah           footnote
>     ## blah: http://somewhere     comment
>     !! directive: foo             directive
> 
> ``##`` strongly evokes "comment" for those familiar with any of a
> variety of scripting languages. It doesn't look hidden, but having
> comments look hidden is a mixed blessing -- IMHO comments *should*
> jump out at you.

I'm neutral on that: +1 for "comment" connotation, -1 for
obtrusiveness of "##", -0 for addition of another first-level
construct.

> ``!!`` is intended to evoke "something odd or surprising is
> happening here".

Again, I'm neutral.

> ``.. `` loses the "hidden" meaning and instead means "leading up to"
> or "side note", and is thus used for targets.

But again, internal targets are inconsistent: they alone need an
underscore.

[Tony]
> I still remember when I finally got David's idea that "we'll delimit
> the *odd* stuff about a document with one symbol, so that the eye
> can spot it easily" (and, implicitly, just by running down the left
> margin of the text, which is a fairly natural thing to do).

That is a significant advantage IMHO.

> In which case, maybe ``#_`` and ``.. _#:`` would be better cases for
> anononymity, for consistency with "anonymous" footnotes. Hmm.

Except that anonymous hyperlinks are not *numbered*, which connotation
is the reason I chose "[#]" for auto-numbered footnotes.

> Tibs' proposal, scheme 3:
>      .. http://somewhere           anonymous uri
>      .. _blah: http://somewhere    named uri
>      .. _blah:                     anchor
>      .. [blah] http://somewhere    footnote
>      .. blah: http://somewhere     comment
>      .. blah:: http://somewhere    directive
> 
> The sole change that this makes to scheme 0 is to say that comments
> may not start with (something that looks like) a URI. We already
> know they may not start with something that looks like a directive,
> so this may not be *too* onerous.

Onerous enough I think. I don't want to restrict comments any more
than they are already. Putting a URI in a comment is not hard to
imagine.

> The disadvantage is that we've lost the initial "_", which served as
> a hint that this was a target URI - but then we don't have that for
> footnotes (in that case, because losing the underscore makes the
> footnote look more like a footnote).

Also the idea (rationalization) was that, like a section title, a
footnote generates an implicit hyperlink target, thus obviating the
need for an initial "_".

[Alan]
> I'd also like to see a clearly-specified markup completely displace
> STX, ending STX `dialect proliferation`_ in Zope once and for all.
> reST won't do this unless it's a clear win for all current STX
> users, which includes eliminating all areas in which reST is more
> awkward than STX. As far as I can see, hyperlink syntax is the only
> such area.

I understand and agree with Alan's concerns, and I *would* like to
accommodate his users' needs. I think such improvements to
reStructuredText are feasible and they would be overall improvements.
However, the syntax must be consistent, unsurprising, and orthogonal.
If that's not possible, then the added syntax must be minimally
intrusive. I don't think we're there yet.

I've made a minimal-change suggestion, #5 in the next post.

Perhaps we need to look at this in a totally different way. Radical
change, not incremental. Revolution, not evolution. Any ideas?

-- 
David Goodger    goodger@users.sourceforge.net    Open-source projects:
 - Python Docstring Processing System: http://docstring.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net
 - The Go Tools Project: http://gotools.sourceforge.net