[Doc-SIG] New document - pytext-fat

Tony J Ibbs (Tibs) tony@lsl.co.uk
Fri, 30 Mar 2001 11:50:20 +0100


Guido van Rossum on fat.html 1.0
> - I think that the references to DOM trees are unnecessarily
>   implementation details -- even though I like the idea of a
>   formalized tree representation.

As I say elsewhere, that's for my convenience - I'm trying to maintain
*one* document, so it has to have everything stuffed into it.

>(I happen to think DOM is overrated,

Oh, I don't disagree - but it' there and we have tools to use it. Rather
an available common format than something I home-grow, if we want tools
to be able to use multiple parsers.

>but I won't object against its use -- I do object against
>   mentioning it in the spec.)

It would not be in a spec that *just* addressed the language for users.

> - Call me oldfashioned, but I like having an extra space between
>   sentences.  Emacs text mode has some very good heuristics for this.

OK, I'll call you old-fashioned. Seriously, having been a long term TeX
user, I used to try to sell this to people. It doesn't work. You are, of
course, allowed to put the spaces in, and also to use a formatter that
puts them back again...

> - Why is it bad to insist on whitespace between list items?  That
>   would make the block rules simpler and cause significantly fewer
>   situations where a list item is mistakenly started.  At the very
>   least I'd insist on a blank line before the first item and after the
>   last item of a list.

Long (past) discussion - the case was made cogently that we would lose
people if we didn't do it, and it's *not* hard to do. It was much
demanded, so it is there.

> - The indentation rules are essentially those used for Python source.
>   I'm glad you say outright that you won't use them in the way ST uses
>   them

I'll say again, we are *not* STClassic. We never intended to be.

>   -- as you may know by now, I think ST's use of indentation
>   level to derive heading levels is painful.

We were stuck with that (for the moment) when we were being
"compatible". But even then plans were afoot (at least in my head) to
get away from it.

> - Using --- for descriptive list is no better than --.  (Note that
>   there's a typo in the example -- the second example uses '--'
>   instead of '---'.  If you *have* to have descriptive lists, try
>   doing something creative with input that already looks the way you
>   propose that descriptive lists be rendered.

No, you're missing the point again.

*In the docstring, the text should read like normal text*.

That's point 1. So use of "--" or "---" or whatever is the natural way
to do it. This point that the docstring should look like *normal* text
is *so* important - it's the *really* important idea behind ST, not all
the cruft about particular implementations.

*In the formatted result* I don't know *how* descriptive lists will
look - that depends so much on the formatting mechanism (HTML, XHTML, an
SGML thingy, TeX, LaTeX, texinfo, PDF, uncle tom cobbly and all) that
there's no way I can (or should) mandate that.

>   I think that maybe, as
>   long as you are measuring indentation anyway, you *should* consider
>   the indentation of subsequent lines, requiring all lines of a text
>   paragraph to be indented the same, and marking the start of a new
>   block on a change in indent.  After all, if we want the source to be
>   readable, we can't tolerate a ragged left margin (except in literal
>   blocks).  Sure, there are people who indent the first line of their
>   paragraphs.  There are also typesetting conventions that *dedent*
>   the first line of each paragraph.  But for plain text, I've found
>   both conventions ugly and distracting, and I wouldn't mind ruling
>   these out so we can use indentation changes for other purposes.

Internal paragraph indentation is a difficult one to decide about. For
simplicity, ignoring it is the best approach, because it makes lists and
things easier (I don't believe that *most* people want to write::

	1. This is some
      text - isn't that ugly

and if they have to indent there... (I don't, though, think we *should*
require them to have to)).

> - Requiring the spaces around the delimiter doesn't help (it's too
>   subtle).

(a) I disagree, and (b) it doesn't matter if "---" is a reserved
sequence, 'cos it's not allowed except in that one context.

> - An alternative could be to use -- or --- but require some kind of
>   explicit markup to start and end a descriptive list.

It would, but I think it would be unobvious.

> - I'm glad that you don't auto-renumber ordered lists.

Previously stpy did, because of compatibility with ST. I didn't like it
then, and would have considered it a thing to talk about after the 1.0
release (heh, I *don't* think even the first version would be perfect!).

> - I'm not sure that there's a point to allowing disjoint text for
>   ordered (or any kind of) list.  Again, if we believe that the source
>   should read as well as plain text, we should require that it is
>   formatted neatly.  The disjoint text example you give seems to come
>   straight from the LaTeX (or similar) manual where it explains that
>   whitespace details in the input are ignored.  But we *shouldn't*
>   ignore any whitespace details in the input, since that's our main
>   clue!

If you mean the::

	1.

         and this is in the list

type of example, it is there (so far as I am concerned) because text
needs formatting before it is finished, and thus allowing the "empty
bits I've to write yet" is good practice.

> - Having to work around auto-detection of numbered lists is my #1 ST
>   pet peeve.  I know that part of that's a ST bug -- but I still
>   believe ordered lists are not sufficiently important to warrant the
>   pain they occasionally cause.

Then I think we may have a fairly fundamental problem/disagreement.
Although the *only problem I can see is that if one has:

	Some text which runs on and uses a German ordinal
	1. (is that right?) part way through

then it will go wrong. I understand that you hate that with a deep and
abiding hate (is the problem with the bullet characters for unordered
lists as offensive?). It is a troubling matter. Personally, I don't much
care, but I *do* like having "natural" lists, and am willing to go to
(some) slight trouble typing to get them.

In all of the instances quoted to me, though, it would be possible to
warn the user that there may be a problem (i.e., there's no punctuation
at the end of the previous line).
And it is *very very* obviously wrong if you actually look at the
formatted output (which I would hope authors would do!).

>   The Emacs text mode I use
>   automatically detects numbered lists and it is *never* what I want.

I'm not responsible for that, nor do I know how it works.

>   At the very least you should require that the rest of the input is
>   neatly formatted the way one would format an ordered list in a plain
>   text document.

That *who* would format it neatly? Using whose convention? I don't like
trying to impose my own formatting conventions on people's use of text
(and I think I would lose).

> - The paragraph about intermingling is ambiguous.  Is it natural to
>   have a list with some ordered items, some items using *, and some
>   items using -?  I think not.  If you meant nesting, of course you're
>   right -- but please say so, and give an example.

Ah - sorry. The point is that the formatter needs to decide that a
bullet item followed by a number item is actually two lists, not one.
I'll have to look at rewriting that (it *was* written very fast!).

> - Do we really need more than two levels of headings?  I kind of doubt
>   it.  Alternatively, we could allow numbered headings (of course the
>   numbers have to be supplied by the author) and derive the level from
>   the structure of the number.  (Q: are unnumbered headings at higher
>   or lower levels than numbered headings?  I dunno!)

I would probably be OK with one. I'd fairly certainly be OK with two.
But given using underlines, it's easy to do three, and I'm *sure* I'd be
OK with three.

I deliberately left other schemes for headings alone (did I? I think I
left stuff out) because (a) its not important in a first release - heck,
it's long enough already, and (b) it's really only needed for longer
documents.

> - About dedented paragraphs after indented sections: you can't really
>   express in regular text that a plain paragraph is not part of the
>   previous section unless you insert a heading.  Maybe a better
>   alternative (again using the rule that we should never ignore the
>   whitespace clues in the source!) would be to simply indent indented
>   headings and and paragraphs, a la <blockquote>.

Are we misunderstanding each other?

One of the problems, for many people, with ST is the need to indent
sub-sections. I agree that this can be a pain. I deliberately dropped it
for fat.html, as a requirment, but it is still perfectly legitimate to
indent the text if you want, and as (I thought it said) then dedenting
will end an indented section.

Remember that we are aiming at docstrings, where the number of headings
(especially given label blocks) is likely to be low.

> - I like the idea of anchor blocks -- they seem to be like References
>   in scientific papers.  But why do they have to start with two dots?
>   And how much semantics (as opposed to formatting) do they need?

The two dots makes it clear one is not starting the paragraph with a
local reference. One needs *some* markup to do that, and someone came up
with two dots last time round the Doc-SIG loop. I just kept the idea.

> - Labels: I'm not sure I get the point.  What is this for?  The
>   "explanation" doesn't explain it for me.  I think this is digressing
>   too far from the "plain text as documentation" idea.

Last time round the Doc-SIG loop, there were a couple of requests that
tied together.

Several concepts like "Author" and "Arguments" came up, and there was a
wish to generalise these, partly because we couldn't predict all of
them. Some of them admit of having their information on one line, and
it's nice to be able to control that. All of them contain information
one might imagine a tool wanting to extract from the document,
"standard" parts of text (cf the non-HTML in javadoc).

Some people wanted to be able to have extensibility built in - arbitrary
additional tags. They were made happy because the label (tag as it was
then) can map easily to an XML tag.

They also *look* like what people put in docstrings anyway - so we might
as well gain leverage off that.

> - The concept of children seems wrong for literal blocks.  I agree
>   with the rule that a literal block starts after a paragraph ending
>   in "::" and ends at the first line that's indented the same or less
>   as that "::" paragraph; but I would propose that conceptually, the
>   entire literal block is a child of the previous paragraph.

It is - that sounds like my explanation is deficient.

A literal block is a single block. It cannot have children (by
definition - they would be part of it, since their indentation is less
than that of their parent block).

I'll need to look at the explanation again - it's probably too close to
implementation-speak.

> - The example with a paragraph consisting of *just* "::" should render
>   that as a single colon, to be consistent.  If you think this should
>   be special-cased, you need to explain why -- the argument "(a) it's
>   not worth preventing" doesn't really hold when you special-case it
>   anyway!

This is a problem with a document typed fast, mostly an hour or two
after my normal bedtime. There's one partricular subtle "gotcha" of the
indentation rules that it helps around, but more importantly one has to
decide to do *something* about an empty '::' paragraph, and I didn't
want to forbid them, and I don't like a "hanging" colon.

> - You can collapse most of the description of doctest blocks with that
>   of literal blocks -- they are really just a different way of
>   *recognizing* a literal block (the >>> start), they are not to be
>   treated differently (except by doctest).  Note that we may not need
>   to recognize doctest blocks separately -- doctest is perfectly happy
>   with indented doctest blocks.

Not quite, because a doctest does not span blank lines.

And recognising doctest blocks separately lets me (a) have the eventual
formetter present them differently (which I want), and (b) will
eventually let me warn a user that they've got something that doctest
might pick up in a literal block (where it presumably *isn't* Python
code).

> - In-line literals: I don't like the use of '...' for literals.  It's
>   too unintuitive (unless you leave the quotes in the output!).

Dealt with by the backtick idea, I hope.

> - The section on Python literals is missing someting -- what is a
>   Python literal?  From the example I have to guess that it's
>   something between hash marks.  It's too ugly IMO.

There was a *long* argument on this last time round the Doc-SIG.
Everyone agreed it was ugly, but that was not the main reason for
adopting it. For this one, can I please ask that you look back in the
list?

(for what little it is worth, I started out opposing them and will now
defend them - I want them left in)

> - URL recognition: you know my position. :-)

I am sort of happy with ad-hoc recognition, but it really does give
problems with trailing punctuation (*not* just fullstops), and going for
ad-hoc recognition seems to me to be at odds with the "purity" of your
approach on some other items...

> Hope this helps,

Despite my gruntles, opinions are useful. It would just be nice if they
didn't all come at once, and didn't require *me* to stand up for debates
that happened long ago and I only half remember.

Tibs

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"How fleeting are all human passions compared with the massive
continuity of ducks." - Dorothy L. Sayers, "Gaudy Night"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)