[Doc-SIG] looking for prior art

Wed, 04 Dec 2002 22:16:59 -0500

I have begun work on a Python source Reader component for Docutils.  I
expect the work to go slowly, as there is lots to absorb, much earlier work
to study and learn from, and little spare time to devote.  I'm trying to
keep it as simple as possible, mostly for my own benefit (lest my brain
explode).

I've looked over the HappyDoc code and Tony "Tibs" Ibbs' PySource prototype.
HappyDoc uses the stdlib "parser" module to parse Python modules into
abstract syntax trees (ASTs), but that seems difficult and fragile, the ASTs
being so low-level.  Tibs' prototype uses the much higher-level ASTs built
by the stdlib "compiler" module, which are much easier to understand.  I've
decided to use the "compiler" module also.

My first stumbling block is in parsing assignments.  I want to extract the
right-hand side (RHS) of assignments straight from the source.  In his
prototype, Tibs rebuilds the RHS from the AST, but that seems rather
roundabout and the results may not match the source perfectly (equivalent,
but not character-for-character).  I think using the "tokenize" module in
parallel with "compiler" may allow the code to extract the raw RHS text, as
well as other raw text that doesn't make it verbatim to the AST.

So, is there any prior art out there?  Any pointers or advice?

-- 
David Goodger  <goodger@python.org>  Open-source projects:
  - Python Docutils: http://docutils.sourceforge.net/
    (includes reStructuredText: http://docutils.sf.net/rst.html)
  - The Go Tools Project: http://gotools.sourceforge.net/