[PYTHON DOC-SIG] setext in doc strings

Jim Fulton jim.fulton@digicool.com
Mon, 05 Aug 1996 16:42:03 -0400


Robin Friedrich wrote:
> 
> I've been working with Daniel Larsson on gendoc. Currently there is a
> little setext parser built into gendoc which identifies text structure
> and stores the components in a metadocument which can be rendered in a
> number of output formats (notably HTML and MML).  Since most folks are
> not necessary familiar with setext markup I'd like to provide a brief
> synopsis. If you use this stuff in your doc strings nice things will
> happen to your autogenerated manuals.:-)

First, I apologize for the tardiness of my reply.

I spent some time looking at setext after the workshop and was fairly
underwhelmed.  Actually setext document I looked at were sort of ugly
in their basic form and example setext converted to html was often
broken.
I also has a tough time making out the setext documentation, which
colored
my opinion somewhat.

In a separate note, I released a Structured text module that I consider 
to be superior to setext in several ways:

  - The sourse text os more readable,
  - It supports arbitrary levels of nesting, including numbered,
bulleted
    and descriptive lists.
  - It generates HTML tags like <strong> and <em>, rather than <bold>
and
    <i>.

> SETEXT 101
> ==========
> 
> Below is the setext definitions from the BSDI project. Note that not
> all tags are supported (or needed) in python doc strings.

This looks like the documentation I found for setext.  I had trouble
making it out then and have touble making it out now. :-|

> 
> Valid Typotags Table
> ---------------------
>  ____________________  ___________________  _______________ ____________ v14
>  current (online) use  setext form          acted upon or        name of
>  of text emphasis      of same              displayed as     the typotag  ?
>  ====================  ===================  =============== ============ ===
>  Internet mail header  From <source>        Subject: shown    subject-tt (a)
>  (start of a message)  minimal mail header  [Date: & From:]

I assume this doesn't apply to us?

>  --------------------  -------------------  --------------- ------------ ---
>  title (1 per text)   "Title                a title             title-tt (b)
>  in distinct position  ====="               in chosen style

Is gendoc using this?  This mechanism of setext is rather restrictive
and ugly.

>  --------------------  -------------------  --------------- ------------ ---
>  heading (1+/ text)   "Subhead              a subhead         subhead-tt (c)
>  in distinct position  -------"             in chosen style

Ditto.

>  --------------------  -------------------  --------------- ------------ ---
>  body text               66-char lines in-  lines undented     indent-tt (d)
>  [plain not-indented]    dented by 2 space  and unfolded

Ditto.

>  --------------------  -------------------  --------------- ------------ ---
>  1+ bold word(s)           **[multi]word**  1+ bold word(s)      bold-tt (e)

*mult word* would be more readable and follows standard conventions.  I
think 
emphasis is better than bold.  This is what I did in StructuredText.

>  a single italic word               ~word~  1 italic word      italic-tt (f)

This looks ugly.  Why specify italic directly?  Doesn't this run counter
to HTML
philosophy.

If the group wants this, I'd be willing to add it to StructuredText.  
If I do, what consitutes a 'word'?

>  1+ underlined words        [_multi]_word_  underlined text underline-tt (g)

What consitutes a word?  Does this run afoul of
multi_word_python_variable_names?

>  hypertextual 1+ word        [multi_]word_  1+ hot word(s)        hot-tt (h)

This is weird.  Where is the reference?  Has this been implemented in
gendoc?

>  >followed by text     >[space][text]       > [mono-spaced]   include-tt (i)

This looks like a quoted email message.  But I guess it makes sense.

>  bullet-text in pos1   *[space][text]       [bullet] [text]    bullet-tt (j)

I think 'o text' and '- text' are more readable.

>                        `_quoted typotag!_`  `_left alone!_`     quote-tt (k)

`_e_gads!_`  I like 'this much better'

>  --------------------  -------------------  --------------- ------------ ---
>  [hypertext link def] ^.. _word URL         jump to address      href-tt (l)
>  [hypertext note def] ^.. _word Note:("*")  ("cause error")      note-tt (m)

I have no idea what this means.

>  --------------------  -------------------  --------------- ------------ ---
>  end of first? setext  $$ [last on a line]  [parse another]   twobuck-tt (n)
>                       ^..[space][not dot]   [line hidden]    suppress-tt (o)
>  logical end of text  ^..[alone on a line]  [taken note of]    twodot-tt (p)

Huh?

>  ====================  ===================  =============== ============ ===
> 
>  Note: only one instance of the element (c) (or, in its absence, (b))
>     is absolutely _required_ for a text to be considered a valid setext.
> 
>  All the elements but (c) are in effect optional, not necessary for
>     a setext to be declared as such.  Element (a) deals with setexts
>     that arrive via email and end up being parsed (processed) as
>     unedited mailbox files; fully employed the (a), (b) and (c) make
>     it possible to distribute "multisetexts", i.e.  setexts with one
>     additional level of logical structure (= more than one setext per
>     message; more than one message in a mailbox).  If such file is
>     viewed as a multisetext it will result in 3-level-outline
>     structure: mail-subjects become top-level chapters, setext titles
>     denote subchapters (topics) and the subheads yet finer threads
>     within these (still a notch ABOVE mere "paragraphs of text").
> 
>  $$
> -----------------------------------------------------------------------
> The following doc string example illustrates the usage of all setext
> constructs recognized by the gendoc tool. (i think)
> 
> class Setext(Text):
>     """Lets you change markup to stylize your text
> 
>     SETEXT 102
>     ==========

This is not valid setext.  Setext wants the titles and headings to start 
in column 1 and the other text in column 3, like this:

SETEXT 102
==========

  **Setext** can be used to mark your text in a non-obtrusive
  manner. Text within double asterisks are treated as bold, ...

>     **Setext** can be used to mark your text in a non-obtrusive
>     manner. Text within double asterisks are treated as bold,
>     while single words with tilde at the front and back are
>     rendered as ~Italic~. You can _underline_a_phrase_ but it
>     will be rendered as bold in HTML. Placing hyperlinks
>     is easy; just hilite_the_tag_ and at the bottom of the doc
>     string include the address which it points to on a line by
>     itself.
> 
>     New paragraphs are separated by blank lines.
>     > And a bunch of literal text
>     > can be specified with the left
>     > arrow. This gets marked as <pre> in HTML.
>     Otherwise the text will be wrapped according to whatever
>     output formatter is used.
> 
>     A bulleted list is done with single asterisks thusly:
>     * Lettuce
>     * Onions
>     * Pickles
> 
>     Extension to setext
>     -------------------

Ditto.

>     A frequent construct in python doc strings is to list ones
>     keyword arguments. This made us wish for a way to specify
>     a definition list so that it looks nice is html (and others).
>     I propose the following. I have this working in my version.
>     The double colons won't be in the output.
> 
>     item1 :: definition 1
>     item2 :: definition 2
>     item3 :: a rather long and involved definition for item 3
>              spanning more than one line.
>     item4 :: back to brevity with definition 4

Why not:

      item1 -- Definition 1
      ...

This looks much better to me, and works with StructuredText.
 
>     .. _hilite_the_tag http://www.python.org
>     """
> 
> Notes:
> 
> The indenting inserted by python-mode for the entire doc string is
> detected and processed out before setext rules are applied. So
> eventhough titles for example are required to start in column one they
> will if they obey the overall indenting for that doc string.

Hm.
 
> The underlines for the title and subtitle should be the same length as
> the title itself.
> 
> Spaces around tokens are important (for the "* ", "> ", and " :: ")
> 
> Comment are hearby welcome.

I think my structured text module mechanism provides richer 
text formatting with less obtrusive markup, especially for 
strings that have much structure, as many of mine do.

Jim

-- 
Jim Fulton         Digital Creations
jim@digicool.com   540.371.6909
## Python is my favorite language ##
##     http://www.python.org/     ##

=================
DOC-SIG  - SIG for the Python Documentation Project

send messages to: doc-sig@python.org
administrivia to: doc-sig-request@python.org
=================