[Types-sig] "Mobius2" -- high performance Python-extensible Python parser

Thu, 12 Apr 2001 15:36:07 -0500

On Thu, Apr 12, 2001 at 12:42:28PM -0700, Paul Prescod wrote:
> What happens if your new Python-like syntax requires new tokens? Is that
> handled at all?

New tokens are not currently handled.  So you can't introduce "->" or
":=" (you can write it as '-' '>', of course).  PyToken_{One,Two,Three}Char
could probably be re-coded to use tables rather than switch statements,
but this hasn't yet been done.  (Are these routines performance-critical,
or could the obvious approach of using a Python mapping be used?)

You can introduce new keywords without difficulty.

Other problems:
Nothing is done to make sure that nonterminals shared with the standard
grammar have the same numbers in both grammars.

Currently, there's no way to free a grammar, so some memory is leaked
when one would otherwise be deleted.

No documentation, and only a few examples.

Jepler wrote:
> > Only a few changes are needed to support the creation of the table at
> > runtime (and it's a fast process, 0.04 seconds to create the standard
> > grammar on a P2-350, or .25 seconds on a 486-75) and to wrap it as a
> > Python object with a .parse() method.  (calls PyParser_ParseString)

Paul:
> Is that the same as saying it would only cost us roughly a 0.04 seconds
> to read Python's grammar at startup time instead of having it hard-coded
> into a C module?

Yes, I think that's the case.

> Could you foresee any arguments against putting this code in core
> Python?

Well, currently there are only a few non-.py files in the Python
standard library (pdb.doc, profile.doc, and plat-*/regen).  The grammar
is thus a new "kind" of file.

The other major kind of objection that I see is that a modifiable
grammar leads to (being a little extreme here) each site developing its
own frankenstein monster of a grammar---"you are in a twisty maze of
parrots, all different".

Of course (I can't tell if this is another objection or a counter to #2)
there's no point to extensibility in the parser if there's no
extensibility in the compiler.  Tools/Compiler is not yet(?) shipped as
a package in the standard library.

pgenmodule.so(stripped) + Grammar is 25987 bytes on my system, and
graminit.o(stripped) is 14244 bytes on my system.  (11589 vs 3725 bytes
"gzip -9"ed)  On "embedded" Python versions, this ~8-12k  difference
may be important.  One side note -- if your system uses only .pyc/.pyo
files, you could leave out the table and the generator and save 14k.

Initialization time may be important to some, don't the occasional
python vs perl benchmarks always mention how perl's startup time is well
less than half of python's?  (tested on my 486 laptop,
python -c 0  0.51s user 0.07s system 26% cpu 2.181 total
perl -e 0  0.27s user 0.22s system 56% cpu 0.866 total)  This goes
double for embedded/handheld systems, some of which are approximately
as powerful as a dumb rock.

Jeff