[Doc-SIG] looking for prior art

Thu, 05 Dec 2002 21:45:14 -0500

Doug Hellmann wrote:
> I'm pretty sure HappyDoc was written before the compiler module was
> generally available

I suspected as much.  Either that, or you're a glutton for punishment
;-)

> I've only had to make a few minor modifications to it in the past,
> since the language syntax hasn't evolved that far.

That's good to know.  Still, the parser.suite() approach seems a lot
harder.

> I'm working on a major overhaul of HappyDoc anyway, so now might be
> the time to rewrite the parsing stuff to use the compiler module.
> If you're interested in collaborating, let me know.

I am, definitely.  What I'd like to do is to take a module, read in
the text, run it through the module parser (using compiler.py and
tokenize.py) and produce a high-level AST full of nodes that are
interesting from an auto-documentation standpoint.  For example, given
this module (x.py)::

    # comment

    """Docstring"""

    """Additional docstring"""

    __docformat__ = 'reStructuredText'

    a = 1
    """Attribute docstring"""

    class C(Super):

        """C's docstring"""

        class_attribute = 1
        """class_attribute's docstring"""

        def __init__(self, text=None):
            """__init__'s docstring"""

            self.instance_attribute = (text * 7
                                       + ' whaddyaknow')
            """instance_attribute's docstring"""

    def f(x, y=a*5, *args):
        """f's docstring"""
        return [x + item for item in args]

    f.function_attribute = 1
    """f.function_attribute's docstring"""

The module parser should produce a high-level AST, something like this
(in pseudo-XML_)::

    <Module filename="x.py">
        <Comment lineno=1>
            comment
        <Docstring lineno=3>
            Docstring
        <Docstring lineno=...>           (I'll leave out the lineno's)
            Additional docstring
        <Attribute name="__docformat__">
            <Expression>
                'reStructuredText'
        <Attribute name="a">
            <Expression>
                1
            <Docstring>
                Attribute docstring
        <Class name="C" inheritance="Super">
            <Docstring>
                C's docstring
            <Attribute name="class_attribute">
                <Expression>
                    1
                <Docstring>
                    class_attribute's docstring
            <Method name="__init__" argnames=['self', ('text', 'None')]>
                <Docstring>
                    __init__'s docstring
                <Attribute name="instance_attribute" instance=True>
                    <Expression>
                        (text * 7
                         + ' whaddyaknow')
                    <Docstring>
                        class_attribute's docstring
        <Function name="f" argnames=['x', ('y', 'a*5'), 'args']
varargs=True>
            <Docstring>
                f's docstring
            <Attribute name="function_attribute">
                <Expression>
                    1
                <Docstring>
                    f.function_attribute's docstring

compiler.parse() provides most of what's needed for this AST.  I think
that "tokenize" can be used to get the rest, and all that's left is to
hunker down and figure out how.  We can determine the line number from
the compiler.parse() AST, and a get_rhs(lineno) method would provide
the rest.

The Docutils Python reader component will transform this AST into a
Python-specific doctree, and then a `stylist transform`_ would further
transform it into a generic doctree.  Namespaces will have to be
compiled for each of the scopes, but I'm not certain at what stage of
processing.

It's very important to keep all docstring processing out of this, so
that it's a completely generic and not tool-specific.

For an overview see:

    http://docutils.sf.net/pep-0258.html#python-source-reader

For very preliminary code see:

    http://docutils.sf.net/docutils/readers/python/moduleparser.py

For tests and example output see:

    http://docutils.sf.net/test/test_readers/test_python/test_parser.py

I have also made some simple scripts to make "compiler", "parser", and
"tokenize" output easier to read.  They use input from the
test_parser.py module above.  See showast, showparse, and showtok in:

    http://docutils.sf.net/test/test_readers/test_python/

.. _pseudo-XML: http://docutils.sf.net/spec/doctree.html#pseudo-xml
.. _stylist transform:
   http://docutils.sf.net/spec/pep-0258.html#stylist-transforms

-- 
David Goodger  <goodger@python.org>  Open-source projects:
  - Python Docutils: http://docutils.sourceforge.net/
    (includes reStructuredText: http://docutils.sf.net/rst.html)
  - The Go Tools Project: http://gotools.sourceforge.net/