[Compiler-sig] Seperating out the parser

Ian Bicking bickiia@earlham.edu
Sat, 12 Feb 2000 14:43:59 -0500


I was disapointed when I didn't get a response, then it turns out I
had just lost the email in the shuffle.  Well, that's a pleasant
surprise.  Anyway...

On Wed, Feb 09, 2000 at 12:56:02PM -0500, Jeremy Hylton wrote:
> Congratulations!  You've sent the first message on the compiler-sig.
> I have been meaning to announce the sig on the main list and get
> things started here, but I've been busy with other things.
> 
> >>>>> "IB" == Ian Bicking <bickiia@earlham.edu> writes:
> 
>   IB> I'm interested in making Python run in MUQ <http://www.muq.org>.
>   IB> I was hoping to take the parser from Python and simply retarget
>   IB> the compiler.  I know there are people who would like to do the
>   IB> same for Guile.
> 
> What is MUQ?  I assume it is something that you want to generate code
> for, but I'm not familiar with it.  The Web page says something about
> virtual worlds, so I'm a bit puzzled.

The intention of MUQ is something for making MUDs, but really it's a
persistent programming environment which could be used for MUDs, but
is considerably more general than that.  It already has a few syntaxes
that compile to its bytecodes -- forth and lisp being the most
natural translations.  The C translation doesn't, well, act much like
C at all.  Which made me think that Python syntax for it would almost
have Python semantics, and provide something more C like than lisp or
forth.

> There was some discussion at SPAM8 on generating Scheme code from
> Python.  Personally, I think MzScheme is a better platform than
> Guile.

That's possible -- I only mentioned Guile because I know there are
people involved with Guile who are interested in this.  Are there 
archives for the SPAM8 list that I can read?

>   IB> I've been reading through the source in Python-1.5.2/Parser, but
>   IB> I was hoping to get a little help about where the border between
>   IB> the compiler and the parser is.  Just a few pointers as to the
>   IB> functions and datastructures that lie around that border, to
>   IB> give me a place to start from.
> 
> The border between the parser and compiler is one of the key issues
> we need to sort out.  The Python compiler (Python/compile.c) uses the
> concrete parse tree generated by the parser.  The parser is generated
> from Grammar/Grammer using pgen.  The compiler is a bit hard to read
> because it's referencing the parse tree nodes used integer indexes.

I'm afraid my knowledge of parsing/compiling is a little narrow, so
I'm not familiar with the distinction between concrete and abstract
parse trees.  But then, I've always tried to fudge my way through
parsing, and this is just another case of that since I'm really just
hoping to figure out how to *use* the already exist parser(s) rather
than implement one.  I guess I should learn a little more about these
things.  [and, having just looked up the definition, I can see why
an abstract syntax would be much more useful]

The output from the parser module in Python creates something more
of a concrete parse tree, doesn't it?  It's called an AST, but it
seems to include things like NEWLINE, which seems rather concrete.

> I'm working on a Python compiler written in Python; it generates
> essentially the same bytecode as compile.c, but it uses an abstract
> syntax tree and it is a bit more readable.  The AST I'm using is the
> one from Python2C (P2C?).  You can get it from the p2c CVS repository.
> (I can't find a URL for it; perhaps Bill or Greg can point us at it.)
> The AST is produced using the builtin Python parser;  the transformer
> module in p2c gets a parse tree using the parser module and converts
> into an AST.  I think this is a good starting point for developing a
> compiler for a different target.

http://lima.mudlib.org/~rassilon/p2c/

I'll definately be looking into this.  What I'd idealy like to do is
to seperate off the parser and make it possible to compile it into
a little C program that can print out the parse tree.  That way I'd
know I had it figured out enough to use, and I think some other
people who'd like to use this (as I mentioned with Guile) would find
that an easy place to start from.

> I expect to propose some changes to the AST when I'm done with the
> compiler.  It looks like nodes like Add, Mul, & Sub should be combined
> into a BinaryOp node, and UnaryAdd, UnarySub, etc. should become
> UnaryOp.  I'm going to wait until I've got a working version before I
> make too much noise about these changes though.  I also plan to write
> a new parser for JPython, using ANTLR to directly produce the AST.  I
> imagine that will also improve my perspective on the right AST.
> 
> I hope to have a snapshot of my code available from the Python CVS
> server early next week.

-- 
Ian Bicking         / 4869 N. Talman Ave. Apt. G, Chicago, IL 60625
bickiia@earlham.edu / http://www.cs.earlham.edu/~bickiia