[Doc-SIG] Formalizing ST

Guido van Rossum guido@digicool.com
Thu, 29 Mar 2001 12:20:06 -0500


> Indentation for structure is contentious with many people, and whilst it
> *sounds* like a good idea (especially to Python people) many object to
> ending up with the bulk of their text indented.

What worked against this specific ST feature is that in ZWikis, you
end up editing sizeable documents in a text box in Netscape, which has
no support for auto-indentation.

> > However the url detection without requiring '<' and '>'
> > delimters around the http:// ... string is a nice feature
> > of MoinMoin markup.
> 
> You haven't been following the me and Edward Loper (and Edward
> Welbourne) flurry of emails over recent weeks, have you?
> 
> The trouble with finding *bare* URIs in a text document written by
> humans with punctuation is that, in the general case, you can't do it.
> For instance, a URI is allowed to end with a dot ('.'). So how do you
> cope with a sentence that ends with http://www.tibsnjoan.co.uk/. Is that
> last do part of the URI or not? There are other issues about what can go
> inside the URI, as well. Yes, people can come up with ad-hoc solutions
> (docutils/stpy.py works reasonably well), but they are ad-hoc and not
> guaranteed to work. This disturbs some people (I'm not *too* fussed, but
> then I'd err on the side of detecting *too many* URIs, I think, which I
> know would upset some people).

The FAQ wizard uses a simple and sufficient rule, which almost never
misfires: it scans up to whitespace, and then trims punctuation
characters from the back.  While URLs certainly *can* end in
punctuation, I have never seen URLs that *did*.  Invariably, a
trailing period or comma is part of the sentence, not part of the URL.

> The *only* safe way (and note that this is an option in MoinMoin also)
> is to delimit the URIs with some mechanism, and '<..>' is at least a
> fairly traditional solution.

Which unfortunately means you would have to escape each < or > that
was not meant to be a URL delimiter.  These occur frequently in Python
code samples (``if i < 10: print i'') but also, and I would say more
frequently, in any documentation that describes XML or HTML samples.
I find the ability to write "<HR> and <hr> are equivalent in HTML but
not in XHTML" more important than the ability to mark URLs
unambiguously, given the success rate of existing heuristics there.

> > Ping has implemented something similar in pydoc already
> > and this works just fine.
> 
> See above - it's "modulo just fine" I'm afraid (Ping is happy with
> approximate solutions that find too many instances - somewhat more than
> myself - so *of course* pydoc does what it does (and of course it
> should)).

That's a new meaning of "modulo". :-)

> > I have a similar feeling with the email address recognition
> 
> Erm - email addresses should be presented as URIs, honest.

Yeah, right.  Tough luck getting people to add mailto: to their
address.  Be practical, and add a hyperlink to anything that looks
like an email address -- if you don't eat any characters that were
present in the source, soemwhat overzealous recognition won't hurt.

> > About lists and numbered lists I'm still not sure what I would like.
> > I bullet item list (LaTeX itemize) seems to be enough for most cases.
> 
> No, that is not sufficient. There are too many of us who *want* (no,
> *need*) more sorts of list (believe me, I've been using a too-simple
> internal markup tool for C function header comments for years, and it
> has only one type of list, delimited by '@' - it's not sufficient -
> people end up writing lists out "by hand", which rather circumvents the
> point).

What exactly is lacking in that tool?  Nested lists?  We can do those.
Numbered lists?  We don't need autonumbered lists, so we can require
that the numbers are already in the source.

--Guido van Rossum (home page: http://www.python.org/~guido/)