[Compiler-sig] Re: Status of ast-branch

Jeremy Hylton jeremy@alum.mit.edu
Wed, 28 Aug 2002 11:21:56 -0400


>>>>> "NN" == Neal Norwitz <neal@metaslash.com> writes:

  NN> Jeremy: I've been meaning to ask, but keep forgetting...

  NN> What is the status of the ast branch?  I know it's pretty
  NN> out-dated now.  How close is it to working?  Do you have any
  NN> specific tasks that need to be done?  I've reviewed the
  NN> checkins, but haven't had a chance to try out the branch.

I think the ast-branch needs a solid week of effort to get it
working.  I have a number of uncommitted changes (on my machine at
home) that wires everything together.  The changes are primarily in
pythonrun.c; the various functions that parse and compile code need to
be updated to parse, convert to ast, and compile.

I've been really overloaded with some work for Zope customers, so I
haven't had as much time as I'd like to work on the ast-branch.  I
keep telling myself that things will easy up in September and I can
find a week or two to concentrate on it.

  NN> I'm still hoping the new AST/compiler can be implemented for
  NN> 2.3.  pychecker is pretty broken w/2.3 and it would be nice to
  NN> have pychecker[23] working with the new compiler...  and maybe
  NN> even jython.  Pretty big dream right now, especially given how
  NN> little work I've done recently.

I'd also like to finish it all before the first 2.3 alpha release.  If
the work can get done in September, I think it will happen.

I'd break down the work into the following tasks:

- Finish ast.c
  - test it against are large sample of code
  - do sensible error handling and memory management

- Write ast marshallers for C and Python
  - Define canonical binary representation for AST
  - Define Python representation for AST
  - Write marshallers to convert between C/Python/binary reps

- Error checking
  - Detect syntax errors

- Finish code generation pass
  - Handle all the rest of the grammar.  Still to do:
    - function and classes
    - ~9 statement types 
    - ~6 expression types

- Write basic block to bytecode conversion
  - Much like pyassem in the compiler package

The first two tasks are easily separated.  The ast.c code could be
repackaged as a C extension module that works with current Python.
Then the ast can be tested against all sorts of real python code, and
the generated AST can be marshalled into the binary representation to
pass back to Python.

I have a few notes on the binary representation, but I think it's a
fairly straightforward problem.  The AST definition (e.g. Python.asdl)
can be used to generate a set of codes for each grammar production.
Then you just need to define universal encodings for the basic types
-- string, number, sequence, optional.

The code generation and assembler could be worked on as separate
components, but it would be hard to test them in isolation.  It's more
likely that I should finish up the assembler and get at least a
minimal full compiler working.  Once I get to that much done, it
should be easier for someone to tackle a particular set of unhandle
grammar productions.

Jeremy