[Doc-SIG] In-line hyperlink alternatives

Benja Fallenstein b.fallenstein@gmx.de
Fri, 15 Nov 2002 18:15:57 +0100


Hi David!

David Goodger wrote:

>Unfortunately, there's a final, showstopper problem with this syntax:
>RFC 2732 ("Format for IPv6 Literal Addresses in URL's") adds the "["
>and "]" characters to the set of possible URI characters.  This means
>we can't surround URIs with "[]" with the current parser, which is
>intentionally limited in its inline markup parsing ability (uses
>regexps).  Here's an example::
>
>    http://[3ffe:2a00:100:7031::1]/
>

Wow. Oops. Ok, point well taken; I've always missed that update to RFC 
2396, so far, and have assumed that [] are still reserved URI chars.

>[in a follow-up:]
>  
>
>>I've come up with a third variation that doesn't break the _
>>convention as much as the other two:
>>
>>An `example hyperlink` <http://example.com>_.
>>    
>>
>
>From there it's a *very* short step back to::
>
>    An `example hyperlink <http://example.com>`_.
>
>One underscore means "named", two means "anonymous", same as in the
>rest of the cases.
>

Well, yes, it can be argued that it is just one backquote moving a 
little forwards. Ultimately, this is a question of taste, but I still 
find that the first version is quite a bit more readable; there, I'm 
able to parse the backquotes as a marker for the extent of the link (as 
in `example hyperlink`_), and the angle bracketed text as an annotation 
to the link-- my interpretation of the syntax is, `example hyperlink`_ 
with an intersparsed annotation that gives the URI inline. With `example 
hyperlink <http://example.com>`_, on the other hand, I find it harder to 
ignore the URI when reading: my eyes search for the corresponding 
closing marker to the first backquote, which in the context of reST I 
interpret as an opening marker (like an opening bracket). What happens 
is that the URI jumps into the foreground (because it's immediately 
before the closing backquotes my eyes are searching for) and doesn't any 
more look like the annotation I'm used to from plain text.

Now, I can understand that you don't want to implement backtracking in 
the parser for this, but I don't actually see why that's necessary (then 
again, I'm still trying to grasp how the parser works, so if I'm 
misinterpreting here, I'd be glad for being corrected). As far as I can 
see, in ``parsers/rst/states.py``, you already distinguish between 
inline literals and single-backquoted text; then at a latter point I 
think you further distinguish between single-backquoted phrase refs 
(underscore at end) and single-backquoted domain-specific text (no 
underscore at end).

How about simply introducing another case, inline hyperlinks? The 
opening marker would be a single backquote (i.e., a backquote not 
preceded or followed by another backquote, as currently). The closing 
marker would be identified by the following regular expression::

r'`\s*<' + uri + r'>_'

(Can be improved by allowing for a second underscore at the end and 
checking that whitespace or punctuation follows.) Possibly we'd have to 
do a little more parsing to get the URI out of the angle brackets, but 
that won't be hard. -- Ok, maybe this isn't extremely beautiful, but 
from what I understand now it could work without implementing backtracking.

Again, it's a matter of taste to decide whether this is worth the 
effort; because of the reasons above, in my humble opinion, it is ;-)

- Benja