[Doc-SIG] URI schemes (was Re: [Docstring-develop] DPS - possible bugs/features)

David Goodger goodger@users.sourceforge.net
Mon, 24 Sep 2001 22:43:38 -0400


[Again, of general interest. Especially: Does anyone know of a URI scheme
registry or official list? (URI schemes are "http", "ftp", "mailto", etc.;
the part of a URI before the ":".)]

[Tony]
> Hmm. Will `a:b` be treated as a URI? (I haven't tested it).

Yes, it will, and in fact you *have* tested it! ``[a:b]`` turned into
``[<link refuri="a:b">a:b</link>]`` in the example from your original
message. (The square brackets are not significant.)

> Is ``a:b`` *really* likely to be a sensible URI, given that ``a`` is
> entirely "local"?

What do you mean by "local"?

> Should we be treating with the whole possible gamut of URIs, or
> restricting ourselves to those most likely?

There are two approaches:

1. Recognize all possible URI schemes, based on the grammar from
   RFC2396. This has the unwanted side effect that ``a:b`` is
   accidentally recognized as a URI. The workaround is to use inline
   literals (not always correct: "the signal:noise ratio") or escape
   the colon (ugly).

2, Recognize only "registered" URI schemes. Accidents like ``a:b``
   won't happen. The disadvantage is that new URI schemes need to be
   added to the parser. I have yet to find a definitive registry of
   URI schemes (anybody know of one?), and I don't want to spend the
   rest of my life adding new schemes as they pop up.

Currently the reStructuredText parser takes approach #1. I wouldn't
want to attempt #2 without an official & complete URI scheme reference.

-- 
David Goodger    goodger@users.sourceforge.net    Open-source projects:
 - Python Docstring Processing System: http://docstring.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net
 - The Go Tools Project: http://gotools.sourceforge.net