[Doc-SIG] lightweight markup: bullets

Edward D. Loper edloper@gradient.cis.upenn.edu
Sat, 14 Apr 2001 13:05:12 EDT


> i would like to see the structured-text-ish approach.

In my mind, there are 2 things we're encoding here: structuring
(lists, sections, literal blocks, etc), and colorizing (emphasized,
inline literals, etc.).  Colorizing only occurs within a paragraph..

I've been working on both designing & implementing a parser for a
markup language for docstrings.  The structuring is based in the
structured-text-ish approach.  I'm currently undecided about whether I
want do do colorizing like E{this} or like *this*.  The advantage of
the former is that it means you can have more types of colorizing
(e.g., colorizing for URIs, for code, for emphasis, for math, for
definitions of terms that should be included in indeces, etc).  The
advantage of the later is that it's presumably more readable.  But if
we go with the later, I think we need to constrain ourselves to maybe
1 or 2 different colors (emph and code/identifier?  or just
identifier?).

> Someone (was it you, edward?) mentioned the non-geek CP4E-type
> audience earlier this week - 

I don't remember mentioning them, but I do think we need to keep them
in mind.  That would be one of my objections to some of the escaping
proposals so far..

> i'm dismayed to think that we're talking about exposing them to code
> in docstrings, eg C<> or Z{} or whatever, that's more cryptic than
> lots of python code.  The docstrings should be more self-obvious, not
> less!!

When I see it in context, it actually doesn't seem that cryptic to me.
But then the people we should be asking about that are people who
don't code.  Maybe we should try encoding some docs with both kinds of
markup, and see what they think.  

> You don't need the secret codes or a tool to read the docstrings in
> the program text.

In general, I think that the colorizing should *never* be necessary to
understand what's being said.. i.e., you should be able to blindly
ignore any X{}s (the "X{" and the "}", not the content).  The only
place where that wouldn't be true would be if X{}s were used to escape
characters, which should hopefully be very rare.

> Evidently, the trick is coming up with a decent set of structured
> text style rules that are unambiguous and "unsurprising" - in
> particular, conventions that don't collide with common writing
> practices.  (Eg, collide ones recently discussed: use of '--' for
> description lists, or "1." at the end of a sentence but beginning of
> line translating to the start of an ordered list item.)  Once again,
> it seems to me that we're close to this goal, but veering off to a
> new language, with C<> or whatever - totally at the expense of the
> reader.

For structuring, I think I have a set of such rules.  I'll send out
mail about that when I've done more testing etc., but basically:
  1. all paragraphs *must* be left-justified
  2. all lists must be either indented or separated by a blank
     line.
  3. The second and subsequent line of a list item must be indented
     further than the bullet.  All lines but the first must be
     left-justified.

     Subsequent paragraphs in the same list item must line up with
     that indentation level.

There are some more, but those are the basics required to avoid
mis-interpreting bullets..  The only true ambiguities you get with
rules like these are things like:
  1. This is a list item whose second line begins with the number
     1.  Was that "1." a bullet or part of a sentence?

> Really, it seems to me that such docstrings would make python code
> *less* readable, not more.

Do you think that we should have any colorizing at all?  If so, what
colors?  People usually talk about *emphasis*, although I really very
rarely find it useful in docstrings (despite its usefulness in
*email*).  The color I most often want is something to mark a token as
a python identifier (or, more generally, to mark a string as Python
code).

If we didn't do any colorizing, we would probably have:
  - paragraphs (in which word-wrapping is legal, etc.)
  - literal blocks (which are displayed as-is)
  - doctest blocks (which are displayed as-is, or possibly colorized)
  - lists (ordered and unordered)
  - sections (and subsections)

If there was no colorizing, I'm pretty sure we could get away with no
escaping mechanism (with carefully chosen structuring rules, it would
never be necessary).

-Edward