[pypy-dev] Compiler Benchmarks

Tue Feb 4 03:11:49 CET 2003

holger krekel <hpk at trillke.net> wrote:
> 
> 
>  Here's the code if you're interested.  As I mentioned, you'll need at
> > least 2.3a1.  The instructions to tie it into the compiler modules are
> > in the docstring:
> > 
> > http://members.bellatlantic.net/~olsongt/python/pparser.py
> 
> Sure looks interesting.  You do a lot of function calls, right?
> 
> > -logistix
> > 
> > P.S. If anyone wants to patch this in and run Tools/Compiler/Regrtest.py
> > against it, results would be appreciated.
> 
> i ran the Lib/test/test_parser against your module but it choked mostly
> (on missing totuple methods among other stuff).  
> 
> Regrtest is currently running with no problems. (it takes some time,
> though :-)
> 

My Regrtest is locking up on test_pickletest (or something like that)
Also saw a bug in simple_stmt.

> If this is really a quite complete parser then pparser.pyc 
> is a lot smaller than the equivalent parsermodule.o.
> not that this means too much :-)
> 
> does your pparser work similar to parsermodule.c ? 
> 

Actually parsermodule doesn't do any of the work.  Here's my
non-authorative explaination of how CPython parses, as I'm still
figuring it out myself.

There's a utility Pgen, which is Python's custom Parser Generator
similar to  Yacc or Bison.  It uses grammar/grammar as input, and
generates graminit.c.  Graminit.c will hurt your brain if you try to
figure out what it's doing (also highlights some of the perils of
automatic code generation).  Parsetok.c takes the generated grammar in
conjunction with some tokenized text, and builds the AST tree based on
these two inputs.

Parsermodule.c just provides a Python interface to the above internals. 
The bulk of the code is validation code, so that if you try to compile
a hand-built ast, it'll throw an error there instead of blowing up the
compiler internals.  It's actually been really handy for me to debug
the stuff I've been writing.

Hand-coding the parser, like I did, is more likely to result in subtile
bugs creeping in and is harder to maintain than PGen if your source
grammar is changing alot.  Right now Python's grammar is reasonably
stable so it's not too bad.  I previously tried to write a Parser
Generator in pure Python, but it had all kinds of bugs and I'm still
not entirely comfortable doing serious low-level debugging in Python. 
I also noticed some comments in PGen that indicated they used a
"unique" method of reducing the grammar.