[Doc-SIG] docutils status report 2

Fri, 8 Dec 2000 14:17:27 -0000

Here is a further status report on the docutils work.

I currently have some code that will:

  1. Find the docstrings in a Python file (currently itself!)
  2. Split the text into paragraphs at blank lines.
  3. Identify lines within a paragraph that start like a list
     item, and split there as well - this allows::

         This is a paragraph.
         1. So is this
         fred -- and so is this

     to produce a paragraph and two lists.
  4. Partition list items up into lists (so that a "bare" <li> is
     actually a child of an <ol>, or whatever, to use HTML terms)
  5. Identify paragraphs starting ">>>" (allowing leading whitespace)
     as Python code (i.e., literal)
  6. Recognise bullet list items (as in ST - the use of "o" may go away
     following David's comments)
  7. Recognise numbered list items (but the final dot *is* required,
     otherwise 3 above will fail on::

        My favourite drink is tea, but also
        I like coffee

     (thinking the second line to have a Roman numeral at the start!)).
     Note that one won't be able to do::

        And the final number is
        1.

     without a spurious list, but I reckon we can live with that!
     (We have to pay for apparent simplicity with true complexity.)
  8. Recognise descriptive list items (note that markup is allowed in
     the "title" of the item, and I'm still hoping to get round to
     enabling::

        ' -- ' -- This is an awkward case

     (I forgot to tweak the RE yet)
  9. Recognise *emphasised* text, **strong** text and 'literal' text
     (but, except by "accident" nesting of markup does not work (well,
     you can't *nest* markup in literal, since it won't be seen!)).
     The emphasised and strong texts may contain any characters (except
     the terminating sequence, of course), and inline literals may
     contain anything but "'". Escaping characters is not yet addressed.
 10. Produce something that's getting close to a sensible DOM tree
     (the only issue is "paragraph within paragraph", which will be
     addressed in my documentation - see next week!).

That's not too dissimilar to what I reported last week, and indeed the
main thing that I've been working on is the "DOM"ising of the code,
including getting lists to nest themselves nicely (well, the code isn't
neat yet, but it works).

There's also been a fair amount of restructuring, part of which is why
the "::" recognition has gone away (don't worry, it's not away for
long).

As before, I'd rather not release code yet, but can if someone *really*
wants to take a look (damn, that sounds so, well, non-open-source).

Things to do next
-----------------
The task list (in no particular order) includes:

 A. Document the supported pyST syntax, so that David and company can
    haggle over the exact syntax that *should* be supported.

 B. Update the internal documentation.

 C. Add support for more markup (I've got a bare minimum for testing at
    the moment).

 D. Define what the command line interface is (i.e., how to specify that
    one wants to parse a file or package, what one wants the output to
    be, and so on.)

 E. Make nested markup work, so one can do::

        *This is **strong and 'literal'** text within emphasised*

I intend to do a little more implementation work (but not a lot) and
then attack task A, pyST syntax, aiming to have a document by the end of
next week. Cross fingers and I might still hit the "before Christmas"
release point...

Tibs
--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"How fleeting are all human passions compared with the massive
continuity of ducks." - Dorothy L. Sayers, "Gaudy Night"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)