From jepler@users.sourceforge.net  Thu Apr 12 03:45:11 2001
From: jepler@users.sourceforge.net (Jeff Epler)
Date: Wed, 11 Apr 2001 21:45:11 -0500
Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python parser
Message-ID: <20010411220212.A3376@potty.housenet>

When the Types sig was just getting up to speed the last time, someone
expressed that there's no easy way to prototype language extensions.
I mentioned "mobius", which is a Python compiler based on a SPARK grammar.

Mobius had performance problems, and didn't ease the implementation of
the new grammar in core Python because it used a different syntax of
grammar.

Mobius2 exposes the Python parser and parser generator to Python code.
It means that you get all the speed of the builtin parser[1], its
reliability[2] and extensibility, and easy portability to core Python.

If any types-sig proposals include language extensions, I suggest that you
take a look at Mobius2.

http://sourceforge.net/projects/mobiuspython/

The portions of interest are the "mobius2" directory, and the demo/*2
directories (currently just two simple uses of mobius2).

Jeff
jepler@inetnebr.com
jepler@sourceforge.net
[1] Actually, it's about 15% faster, and I'm not sure why.
[2] A 159-case test suite covers all productions in the core Python 2.0
    language.  All cases are verified to produce the same results (ASTs)
    as the builtin parser, which isn't terribly surprising.


From paulp@ActiveState.com  Thu Apr 12 19:19:26 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Thu, 12 Apr 2001 11:19:26 -0700
Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python
 parser
References: <20010411220212.A3376@potty.housenet>
Message-ID: <3AD5F1AE.2E3F7EDD@ActiveState.com>

Jeff Epler wrote:
> 
>...
>
> Mobius2 exposes the Python parser and parser generator to Python code.
> It means that you get all the speed of the builtin parser[1], its
> reliability[2] and extensibility, and easy portability to core Python.

This sounds really cool. The docs look pretty old so perhaps you could
give us a quick overview of how you pulled off this neat trick! Doesn't
Python do parser generation at build time? Is Mobius Python a parser
generator that depends on a compiler or does it allow you to load new
parsers at runtime with a plain old Python installation?
 
I mean if Mobius parsers Python code faster than Python and it is
runtime extensible then it seems like something that should be destined
to replace Python's current parser! What's the downside?

-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From paulp@ActiveState.com  Thu Apr 12 20:42:28 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Thu, 12 Apr 2001 12:42:28 -0700
Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python
 parser
References: <20010411220212.A3376@potty.housenet> <3AD5F1AE.2E3F7EDD@ActiveState.com> <20010412142901.C17031@inetnebr.com>
Message-ID: <3AD60524.41E31BCD@ActiveState.com>

What happens if your new Python-like syntax requires new tokens? Is that
handled at all?

Jeff Epler wrote:
> 
>...
> 
> Only a few changes are needed to support the creation of the table at
> runtime (and it's a fast process, 0.04 seconds to create the standard
> grammar on a P2-350, or .25 seconds on a 486-75) and to wrap it as a
> Python object with a .parse() method.  (calls PyParser_ParseString)

Is that the same as saying it would only cost us roughly a 0.04 seconds
to read Python's grammar at startup time instead of having it hard-coded
into a C module?

Could you foresee any arguments against putting this code in core
Python?
-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From jepler@inetnebr.com  Thu Apr 12 20:29:03 2001
From: jepler@inetnebr.com (Jeff Epler)
Date: Thu, 12 Apr 2001 14:29:03 -0500
Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python parser
In-Reply-To: <3AD5F1AE.2E3F7EDD@ActiveState.com>; from paulp@ActiveState.com on Thu, Apr 12, 2001 at 11:19:26AM -0700
References: <20010411220212.A3376@potty.housenet> <3AD5F1AE.2E3F7EDD@ActiveState.com>
Message-ID: <20010412142901.C17031@inetnebr.com>

On Thu, Apr 12, 2001 at 11:19:26AM -0700, Paul Prescod wrote:
> This sounds really cool. The docs look pretty old so perhaps you could
> give us a quick overview of how you pulled off this neat trick!

The docs apply to "Mobius1", the SPARK-based version.  So, yeah,
they're woefully outdated (and sparse even for what they describe)

> Doesn't Python do parser generation at build time? Is Mobius Python
> a parser generator that depends on a compiler or does it allow you to
> load new parsers at runtime with a plain old Python installation?

The Python build process constructs a parser and writes it out as a
static table.  However, when in memory the table is immediately usable
as a parser without the intervening step of being written into a C file 
and compiled.

Only a few changes are needed to support the creation of the table at
runtime (and it's a fast process, 0.04 seconds to create the standard
grammar on a P2-350, or .25 seconds on a 486-75) and to wrap it as a
Python object with a .parse() method.  (calls PyParser_ParseString)

> I mean if Mobius parsers Python code faster than Python and it is
> runtime extensible then it seems like something that should be destined
> to replace Python's current parser! What's the downside?

Well, I'm confused by any difference in speed, given that the parser
code is Python's internal parser, and the tables generated should be
identical when using the standard grammar.  The only other difference
is that the tables are on the heap instead of with static variables.
Why this accounts for a 15% speed difference is beyond me.  The data
is laid down in a more advantageous manner than it is in the generated
"graminit.c" file?  Cache effects?

Jeff


From jepler@inetnebr.com  Thu Apr 12 21:36:07 2001
From: jepler@inetnebr.com (Jeff Epler)
Date: Thu, 12 Apr 2001 15:36:07 -0500
Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python parser
In-Reply-To: <3AD60524.41E31BCD@ActiveState.com>; from paulp@activestate.com on Thu, Apr 12, 2001 at 12:42:28PM -0700
References: <20010411220212.A3376@potty.housenet> <3AD5F1AE.2E3F7EDD@ActiveState.com> <20010412142901.C17031@inetnebr.com> <3AD60524.41E31BCD@ActiveState.com>
Message-ID: <20010412153605.A17582@inetnebr.com>

On Thu, Apr 12, 2001 at 12:42:28PM -0700, Paul Prescod wrote:
> What happens if your new Python-like syntax requires new tokens? Is that
> handled at all?

New tokens are not currently handled.  So you can't introduce "->" or
":=" (you can write it as '-' '>', of course).  PyToken_{One,Two,Three}Char
could probably be re-coded to use tables rather than switch statements,
but this hasn't yet been done.  (Are these routines performance-critical,
or could the obvious approach of using a Python mapping be used?)

You can introduce new keywords without difficulty.

Other problems:
Nothing is done to make sure that nonterminals shared with the standard
grammar have the same numbers in both grammars.

Currently, there's no way to free a grammar, so some memory is leaked
when one would otherwise be deleted.

No documentation, and only a few examples.

Jepler wrote:
> > Only a few changes are needed to support the creation of the table at
> > runtime (and it's a fast process, 0.04 seconds to create the standard
> > grammar on a P2-350, or .25 seconds on a 486-75) and to wrap it as a
> > Python object with a .parse() method.  (calls PyParser_ParseString)

Paul:
> Is that the same as saying it would only cost us roughly a 0.04 seconds
> to read Python's grammar at startup time instead of having it hard-coded
> into a C module?

Yes, I think that's the case.

> Could you foresee any arguments against putting this code in core
> Python?

Well, currently there are only a few non-.py files in the Python
standard library (pdb.doc, profile.doc, and plat-*/regen).  The grammar
is thus a new "kind" of file.

The other major kind of objection that I see is that a modifiable
grammar leads to (being a little extreme here) each site developing its
own frankenstein monster of a grammar---"you are in a twisty maze of
parrots, all different".

Of course (I can't tell if this is another objection or a counter to #2)
there's no point to extensibility in the parser if there's no
extensibility in the compiler.  Tools/Compiler is not yet(?) shipped as
a package in the standard library.

pgenmodule.so(stripped) + Grammar is 25987 bytes on my system, and
graminit.o(stripped) is 14244 bytes on my system.  (11589 vs 3725 bytes
"gzip -9"ed)  On "embedded" Python versions, this ~8-12k  difference
may be important.  One side note -- if your system uses only .pyc/.pyo
files, you could leave out the table and the generator and save 14k.

Initialization time may be important to some, don't the occasional
python vs perl benchmarks always mention how perl's startup time is well
less than half of python's?  (tested on my 486 laptop,
python -c 0  0.51s user 0.07s system 26% cpu 2.181 total
perl -e 0  0.27s user 0.22s system 56% cpu 0.866 total)  This goes
double for embedded/handheld systems, some of which are approximately
as powerful as a dumb rock.

Jeff


From bob@deepware.com  Fri Apr 13 20:31:20 2001
From: bob@deepware.com (Bob Weiner)
Date: Fri, 13 Apr 2001 12:31:20 -0700 (PDT)
Subject: [Types-sig] First release (0.80) of BWCTO Interface package
In-Reply-To: <Pine.LNX.4.21.0103271157250.5336-100000@localhost.localdomain>
References: <Pine.LNX.4.21.0103271157250.5336-100000@localhost.localdomain>
Message-ID: <987190280.3ad7540819417@webmail.bchosting.com>

Any feedback on my Interface implementation for Python announced here
last month?  Comparison to the Digital Creations work?  Integration with PEP-245?

The code is at:
  http://www.deepware.com/pub/python/bwcto-interface-00.80.tgz

Regards,

Bob


From michel@digicool.com  Tue Apr 17 03:58:39 2001
From: michel@digicool.com (Michel Pelletier)
Date: Mon, 16 Apr 2001 19:58:39 -0700 (PDT)
Subject: [Types-sig] "Mobius2" -- high performance Python-extensible
 Python parser
In-Reply-To: <20010411220212.A3376@potty.housenet>
Message-ID: <Pine.LNX.4.32.0104161952390.26692-100000@localhost.localdomain>

Ok, I tried this out for a bit, trying to come up with a prototype of PEP
245.

After a couple of hours, I think I wrapped my head around it enough to
figure some stuff out.

mycodegen generates the actual bytecode.  I'm not sure how to get this
thing to say 'instanciate one of these things defined in a python module'.

mymodule is some kind of top level compiler object that I override with my
code generator and my transformer.

mynodes defines the classes for any new types of nodes I'm adding and
registers them.

I'm not sure what a transformer does, turns parsed code into an AST?  I
think I have that right.

So anyway, my blockage is, I think, in the codegen step.  How do I tell it
to create bytecodes that import a module and instanciate a class I define?
Maybe I need to read up on bytecodes more...

Any help would be great,

-Michel

I think I have it to the point where it understands my interface grammer,
I'm just not sure how to tell it to import a module to create, for
example, interface nodes.

On Wed, 11 Apr 2001, Jeff Epler wrote:

> When the Types sig was just getting up to speed the last time, someone
> expressed that there's no easy way to prototype language extensions.
> I mentioned "mobius", which is a Python compiler based on a SPARK grammar.
>
> Mobius had performance problems, and didn't ease the implementation of
> the new grammar in core Python because it used a different syntax of
> grammar.
>
> Mobius2 exposes the Python parser and parser generator to Python code.
> It means that you get all the speed of the builtin parser[1], its
> reliability[2] and extensibility, and easy portability to core Python.
>
> If any types-sig proposals include language extensions, I suggest that you
> take a look at Mobius2.
>
> http://sourceforge.net/projects/mobiuspython/
>
> The portions of interest are the "mobius2" directory, and the demo/*2
> directories (currently just two simple uses of mobius2).
>
> Jeff
> jepler@inetnebr.com
> jepler@sourceforge.net
> [1] Actually, it's about 15% faster, and I'm not sure why.
> [2] A 159-case test suite covers all productions in the core Python 2.0
>     language.  All cases are verified to produce the same results (ASTs)
>     as the builtin parser, which isn't terribly surprising.
>
> _______________________________________________
> Types-SIG mailing list
> Types-SIG@python.org
> http://mail.python.org/mailman/listinfo/types-sig
>


From jepler@inetnebr.com  Tue Apr 17 13:22:31 2001
From: jepler@inetnebr.com (Jeff Epler)
Date: Tue, 17 Apr 2001 07:22:31 -0500
Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python parser
In-Reply-To: <Pine.LNX.4.32.0104161952390.26692-100000@localhost.localdomain>
References: <20010411220212.A3376@potty.housenet> <Pine.LNX.4.32.0104161952390.26692-100000@localhost.localdomain>
Message-ID: <20010417072231.A1346@potty.housenet>

On Mon, Apr 16, 2001 at 07:58:39PM -0700, Michel Pelletier wrote:
> mycodegen generates the actual bytecode.  I'm not sure how to get this
> thing to say 'instanciate one of these things defined in a python module'.

Michael,

The organization of the demos is somewhat rough, as you can see.  The
basic organization is to associate one "my*" file with one step of the
compilation process.

The process looks like this:

source code -> (tokenization, parsing) -> AST
	This step is modified by using a different Grammar file
AST -> Nodes
	This step is modified by adding new nodes to 'mynodes.py'
	*and* visiting new AST subtrees in 'myxform.py'
Nodes -> Bytecode
	This step is modified by handling new nodes in 'mycodegen.py'

The AST -> Nodes -> Bytecode steps are preexisting in Python 2.0's
Tools/compiler, though I may have chosen a clumsy way to extend them.
(In particular, things are fragile, because you must magically make your
new Grammar have the same symbol numbers as the builtin grammar for all
nonterminals which exist in both, and then get the 'addsym' statements in
'mynodes' in the right order.  I'm trying to cook up a good way to address
this in the future)

Anyhow, as to how to add code to some special spots in your new grammar ...

Take a look at demo/html2/compile_template.py (originally snatched from
Quixote).  At each file_input (the top node in a module), it adds the
equivalent of 'from IO_MODULE import IO_CLASS', and at the top of each
function it instantiates an IO_CLASS.

html2 adds some nodes for the desired code within the transformer, and then
lets the code generation module generate the associated code.

Jeff
jepler@inetnebr.com

    def file_input(self, nodelist):
        # Add a "from IO_MODULE import IO_CLASS" statement to the
        # beginning of the module.
        doc = self.get_docstring(nodelist, symbol.file_input)
        imp = Node('from', IO_MODULE, [(IO_CLASS, None)])

        # Add an IO_INSTANCE binding for module level expressions (like
        # doc strings).  This instance will not be returned.
        klass = Node('name', IO_CLASS)
        instance = Node('call_func', klass, [])
        assign_name = Node('ass_name', IO_INSTANCE, OP_ASSIGN)
        assign = Node('assign', [assign_name], instance)

        stmts = [ imp, assign ]

        for node in nodelist:
            if node[0] != token.ENDMARKER and node[0] != token.NEWLINE:
                self.com_append_stmt(stmts, node)

        return Node('module', doc, Node('stmt', stmts))

    def funcdef(self, nodelist):
	# ...
	# create an instance, assign to IO_INSTANCE
        klass = Node('name', IO_CLASS)
        instance = Node('call_func', klass, [])
        assign_name = Node('ass_name', IO_INSTANCE, OP_ASSIGN)
        assign = Node('assign', [assign_name], instance)


From michel@digicool.com  Tue Apr 17 17:42:05 2001
From: michel@digicool.com (Michel Pelletier)
Date: Tue, 17 Apr 2001 09:42:05 -0700 (PDT)
Subject: [Types-sig] "Mobius2" -- high performance Python-extensible
 Python parser
In-Reply-To: <20010417072231.A1346@potty.housenet>
Message-ID: <Pine.LNX.4.32.0104170934070.26692-100000@localhost.localdomain>


On Tue, 17 Apr 2001, Jeff Epler wrote:

> On Mon, Apr 16, 2001 at 07:58:39PM -0700, Michel Pelletier wrote:
> > mycodegen generates the actual bytecode.  I'm not sure how to get this
> > thing to say 'instanciate one of these things defined in a python module'.
>
> Michael,
>
> The organization of the demos is somewhat rough, as you can see.  The
> basic organization is to associate one "my*" file with one step of the
> compilation process.
>
> The process looks like this:
>
> source code -> (tokenization, parsing) -> AST
> 	This step is modified by using a different Grammar file
> AST -> Nodes
> 	This step is modified by adding new nodes to 'mynodes.py'
> 	*and* visiting new AST subtrees in 'myxform.py'
> Nodes -> Bytecode
> 	This step is modified by handling new nodes in 'mycodegen.py'

Ok, I think I have that so far.

> The AST -> Nodes -> Bytecode steps are preexisting in Python 2.0's
> Tools/compiler, though I may have chosen a clumsy way to extend them.
> (In particular, things are fragile, because you must magically make your
> new Grammar have the same symbol numbers as the builtin grammar for all
> nonterminals which exist in both, and then get the 'addsym' statements in
> 'mynodes' in the right order.  I'm trying to cook up a good way to address
> this in the future)

Yeah I ran into this one when I shuffled some of the symbols around in the
Grammar file, chaos ensued.

> Anyhow, as to how to add code to some special spots in your new grammar ...
>
> Take a look at demo/html2/compile_template.py (originally snatched from
> Quixote).  At each file_input (the top node in a module), it adds the
> equivalent of 'from IO_MODULE import IO_CLASS', and at the top of each
> function it instantiates an IO_CLASS.

Ok, I think I get it.

I am still having a problem though that I can't get past.  Here's snippet
of my Grammar file:

interfacedef: 'interface' NAME ['(' testlist ')'] ':' suite

This is the last statement in the file and it is essetially identical to a
classdef.  Here's my module:

class MyModule(MobiusModule):
	cmagic = MAGIC
	transformer = MyTransformer
 	cgen = MyCodeGenerator

MyTransformer just defines a grammar attribute and an 'interfacedef'
method that just does the same thing as 'classdef'.  Changing my Grammar
file clearly changes the behavior of my programs, so I know I'm using the
right file.  Here's my test suite:

interface Hello:
    """ Say hello. """

    def hello(name):
        """ Say hello to 'name'. """

class MyHello:

    def hello(self, name):
        return "hello %s" % name

x = MyHello()
print x.hello('world!')

But everytime I try to run this script, I get a Syntax Error:

[michel@heinlein interfaces]$ python2.0 pyinterfaces.py
Traceback (most recent call last):
  File "pyinterfaces.py", line 35, in ?
    if __name__ == '__main__': import test
  File "/usr/local/lib/python2.0/ihooks.py", line 396, in import_module
    q, tail = self.find_head_package(parent, name)
  File "/usr/local/lib/python2.0/ihooks.py", line 432, in
find_head_package
    q = self.import_it(head, qname, parent)
  File "/usr/local/lib/python2.0/ihooks.py", line 485, in import_it
    m = self.loader.load_module(fqname, stuff)
  File "mobius2/imphooks.py", line 119, in load_module
    return self.loader.load_compiled(name, filename, file)
  File "mobius2/imphooks.py", line 55, in load_compiled
    return self.load_source(name, source_filename)
  File "mobius2/imphooks.py", line 79, in load_source
    code = self.compiler(file, filename, output)
  File "mobius2/mcompile.py", line 66, in __call__
    template.compile()
  File "mobius2/mcompile.py", line 43, in compile
    ast = self.parse(self.source, self.filename)
  File "mobius2/mcompile.py", line 40, in parse
    return self.transformer().parsesuite(source)
  File "mobius2/mcompile.py", line 26, in parsesuite
    ast = self.grammar.suite(source + "\n", 1)
  File "mobius2/Parser.py", line 26, in suite
    return self.parse(code, lineno)
  File "mobius2/Parser.py", line 23, in parse
    return self.grammar.parse(code, start_code, lineno)
  File "<string>", line 1
    interface Hello:
            ^
SyntaxError: invalid syntax

I'm not sure why I get it, but the error is being raised in the pgen.c
code somewhere, so I can't follow it any deeper.  Now I'm stuck. ;(

I must be missing something,

-Michel


From jepler@users.sourceforge.net  Tue Apr 17 19:25:42 2001
From: jepler@users.sourceforge.net (Jeff Epler)
Date: Tue, 17 Apr 2001 13:25:42 -0500
Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python parser
In-Reply-To: <Pine.LNX.4.32.0104170934070.26692-100000@localhost.localdomain>; from michel@digicool.com on Tue, Apr 17, 2001 at 09:42:05AM -0700
References: <20010417072231.A1346@potty.housenet> <Pine.LNX.4.32.0104170934070.26692-100000@localhost.localdomain>
Message-ID: <20010417132540.A26139@inetnebr.com>

On Tue, Apr 17, 2001 at 09:42:05AM -0700, Michel Pelletier wrote:
> I am still having a problem though that I can't get past.  Here's snippet
> of my Grammar file:
> 
> interfacedef: 'interface' NAME ['(' testlist ')'] ':' suite

You don't say whether you did this, but you need to amend the rule for
'compound_stmt':

compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | funcdef | classdef | interfacedef

Your 'myxform' file also needs to reference the grammar file:

class MyTransformer(mobius2.mcompile.MobiusTransformerMixin,
	compile_template.TemplateTransformer):
    grammar = mobius2.Parser.Parser("Grammar")
    # ...

You can try executing in immediate mode:
	import mobius2.Parser
	g = mobius2.Parser.Parser("Grammar")
	g.suite(open("interfacetest.py").read())
this should return a large nested tuple if the grammar is working.

Jeff


From thomas.heller@ion-tof.com  Wed Apr 18 17:39:28 2001
From: thomas.heller@ion-tof.com (Thomas Heller)
Date: Wed, 18 Apr 2001 18:39:28 +0200
Subject: [Types-sig] Interesting reading
Message-ID: <04a101c0c826$230645e0$e000a8c0@thomasnotebook>

I found it interesting that the python object model is very
similar to what Axel Tobias Schreiner describes in his book
'Objekt-orientierte Programmierung mit ANSI-C'.

An english translation is available online:

http://www.informatik.uni-osnabrueck.de/axel/books/ooc.pdf

Regards,

Thomas