From jepler@users.sourceforge.net Thu Apr 12 03:45:11 2001 From: jepler@users.sourceforge.net (Jeff Epler) Date: Wed, 11 Apr 2001 21:45:11 -0500 Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python parser Message-ID: <20010411220212.A3376@potty.housenet> When the Types sig was just getting up to speed the last time, someone expressed that there's no easy way to prototype language extensions. I mentioned "mobius", which is a Python compiler based on a SPARK grammar. Mobius had performance problems, and didn't ease the implementation of the new grammar in core Python because it used a different syntax of grammar. Mobius2 exposes the Python parser and parser generator to Python code. It means that you get all the speed of the builtin parser[1], its reliability[2] and extensibility, and easy portability to core Python. If any types-sig proposals include language extensions, I suggest that you take a look at Mobius2. http://sourceforge.net/projects/mobiuspython/ The portions of interest are the "mobius2" directory, and the demo/*2 directories (currently just two simple uses of mobius2). Jeff jepler@inetnebr.com jepler@sourceforge.net [1] Actually, it's about 15% faster, and I'm not sure why. [2] A 159-case test suite covers all productions in the core Python 2.0 language. All cases are verified to produce the same results (ASTs) as the builtin parser, which isn't terribly surprising. From paulp@ActiveState.com Thu Apr 12 19:19:26 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Thu, 12 Apr 2001 11:19:26 -0700 Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python parser References: <20010411220212.A3376@potty.housenet> Message-ID: <3AD5F1AE.2E3F7EDD@ActiveState.com> Jeff Epler wrote: > >... > > Mobius2 exposes the Python parser and parser generator to Python code. > It means that you get all the speed of the builtin parser[1], its > reliability[2] and extensibility, and easy portability to core Python. This sounds really cool. The docs look pretty old so perhaps you could give us a quick overview of how you pulled off this neat trick! Doesn't Python do parser generation at build time? Is Mobius Python a parser generator that depends on a compiler or does it allow you to load new parsers at runtime with a plain old Python installation? I mean if Mobius parsers Python code faster than Python and it is runtime extensible then it seems like something that should be destined to replace Python's current parser! What's the downside? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp@ActiveState.com Thu Apr 12 20:42:28 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Thu, 12 Apr 2001 12:42:28 -0700 Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python parser References: <20010411220212.A3376@potty.housenet> <3AD5F1AE.2E3F7EDD@ActiveState.com> <20010412142901.C17031@inetnebr.com> Message-ID: <3AD60524.41E31BCD@ActiveState.com> What happens if your new Python-like syntax requires new tokens? Is that handled at all? Jeff Epler wrote: > >... > > Only a few changes are needed to support the creation of the table at > runtime (and it's a fast process, 0.04 seconds to create the standard > grammar on a P2-350, or .25 seconds on a 486-75) and to wrap it as a > Python object with a .parse() method. (calls PyParser_ParseString) Is that the same as saying it would only cost us roughly a 0.04 seconds to read Python's grammar at startup time instead of having it hard-coded into a C module? Could you foresee any arguments against putting this code in core Python? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From jepler@inetnebr.com Thu Apr 12 20:29:03 2001 From: jepler@inetnebr.com (Jeff Epler) Date: Thu, 12 Apr 2001 14:29:03 -0500 Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python parser In-Reply-To: <3AD5F1AE.2E3F7EDD@ActiveState.com>; from paulp@ActiveState.com on Thu, Apr 12, 2001 at 11:19:26AM -0700 References: <20010411220212.A3376@potty.housenet> <3AD5F1AE.2E3F7EDD@ActiveState.com> Message-ID: <20010412142901.C17031@inetnebr.com> On Thu, Apr 12, 2001 at 11:19:26AM -0700, Paul Prescod wrote: > This sounds really cool. The docs look pretty old so perhaps you could > give us a quick overview of how you pulled off this neat trick! The docs apply to "Mobius1", the SPARK-based version. So, yeah, they're woefully outdated (and sparse even for what they describe) > Doesn't Python do parser generation at build time? Is Mobius Python > a parser generator that depends on a compiler or does it allow you to > load new parsers at runtime with a plain old Python installation? The Python build process constructs a parser and writes it out as a static table. However, when in memory the table is immediately usable as a parser without the intervening step of being written into a C file and compiled. Only a few changes are needed to support the creation of the table at runtime (and it's a fast process, 0.04 seconds to create the standard grammar on a P2-350, or .25 seconds on a 486-75) and to wrap it as a Python object with a .parse() method. (calls PyParser_ParseString) > I mean if Mobius parsers Python code faster than Python and it is > runtime extensible then it seems like something that should be destined > to replace Python's current parser! What's the downside? Well, I'm confused by any difference in speed, given that the parser code is Python's internal parser, and the tables generated should be identical when using the standard grammar. The only other difference is that the tables are on the heap instead of with static variables. Why this accounts for a 15% speed difference is beyond me. The data is laid down in a more advantageous manner than it is in the generated "graminit.c" file? Cache effects? Jeff From jepler@inetnebr.com Thu Apr 12 21:36:07 2001 From: jepler@inetnebr.com (Jeff Epler) Date: Thu, 12 Apr 2001 15:36:07 -0500 Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python parser In-Reply-To: <3AD60524.41E31BCD@ActiveState.com>; from paulp@activestate.com on Thu, Apr 12, 2001 at 12:42:28PM -0700 References: <20010411220212.A3376@potty.housenet> <3AD5F1AE.2E3F7EDD@ActiveState.com> <20010412142901.C17031@inetnebr.com> <3AD60524.41E31BCD@ActiveState.com> Message-ID: <20010412153605.A17582@inetnebr.com> On Thu, Apr 12, 2001 at 12:42:28PM -0700, Paul Prescod wrote: > What happens if your new Python-like syntax requires new tokens? Is that > handled at all? New tokens are not currently handled. So you can't introduce "->" or ":=" (you can write it as '-' '>', of course). PyToken_{One,Two,Three}Char could probably be re-coded to use tables rather than switch statements, but this hasn't yet been done. (Are these routines performance-critical, or could the obvious approach of using a Python mapping be used?) You can introduce new keywords without difficulty. Other problems: Nothing is done to make sure that nonterminals shared with the standard grammar have the same numbers in both grammars. Currently, there's no way to free a grammar, so some memory is leaked when one would otherwise be deleted. No documentation, and only a few examples. Jepler wrote: > > Only a few changes are needed to support the creation of the table at > > runtime (and it's a fast process, 0.04 seconds to create the standard > > grammar on a P2-350, or .25 seconds on a 486-75) and to wrap it as a > > Python object with a .parse() method. (calls PyParser_ParseString) Paul: > Is that the same as saying it would only cost us roughly a 0.04 seconds > to read Python's grammar at startup time instead of having it hard-coded > into a C module? Yes, I think that's the case. > Could you foresee any arguments against putting this code in core > Python? Well, currently there are only a few non-.py files in the Python standard library (pdb.doc, profile.doc, and plat-*/regen). The grammar is thus a new "kind" of file. The other major kind of objection that I see is that a modifiable grammar leads to (being a little extreme here) each site developing its own frankenstein monster of a grammar---"you are in a twisty maze of parrots, all different". Of course (I can't tell if this is another objection or a counter to #2) there's no point to extensibility in the parser if there's no extensibility in the compiler. Tools/Compiler is not yet(?) shipped as a package in the standard library. pgenmodule.so(stripped) + Grammar is 25987 bytes on my system, and graminit.o(stripped) is 14244 bytes on my system. (11589 vs 3725 bytes "gzip -9"ed) On "embedded" Python versions, this ~8-12k difference may be important. One side note -- if your system uses only .pyc/.pyo files, you could leave out the table and the generator and save 14k. Initialization time may be important to some, don't the occasional python vs perl benchmarks always mention how perl's startup time is well less than half of python's? (tested on my 486 laptop, python -c 0 0.51s user 0.07s system 26% cpu 2.181 total perl -e 0 0.27s user 0.22s system 56% cpu 0.866 total) This goes double for embedded/handheld systems, some of which are approximately as powerful as a dumb rock. Jeff From bob@deepware.com Fri Apr 13 20:31:20 2001 From: bob@deepware.com (Bob Weiner) Date: Fri, 13 Apr 2001 12:31:20 -0700 (PDT) Subject: [Types-sig] First release (0.80) of BWCTO Interface package In-Reply-To: References: Message-ID: <987190280.3ad7540819417@webmail.bchosting.com> Any feedback on my Interface implementation for Python announced here last month? Comparison to the Digital Creations work? Integration with PEP-245? The code is at: http://www.deepware.com/pub/python/bwcto-interface-00.80.tgz Regards, Bob From michel@digicool.com Tue Apr 17 03:58:39 2001 From: michel@digicool.com (Michel Pelletier) Date: Mon, 16 Apr 2001 19:58:39 -0700 (PDT) Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python parser In-Reply-To: <20010411220212.A3376@potty.housenet> Message-ID: Ok, I tried this out for a bit, trying to come up with a prototype of PEP 245. After a couple of hours, I think I wrapped my head around it enough to figure some stuff out. mycodegen generates the actual bytecode. I'm not sure how to get this thing to say 'instanciate one of these things defined in a python module'. mymodule is some kind of top level compiler object that I override with my code generator and my transformer. mynodes defines the classes for any new types of nodes I'm adding and registers them. I'm not sure what a transformer does, turns parsed code into an AST? I think I have that right. So anyway, my blockage is, I think, in the codegen step. How do I tell it to create bytecodes that import a module and instanciate a class I define? Maybe I need to read up on bytecodes more... Any help would be great, -Michel I think I have it to the point where it understands my interface grammer, I'm just not sure how to tell it to import a module to create, for example, interface nodes. On Wed, 11 Apr 2001, Jeff Epler wrote: > When the Types sig was just getting up to speed the last time, someone > expressed that there's no easy way to prototype language extensions. > I mentioned "mobius", which is a Python compiler based on a SPARK grammar. > > Mobius had performance problems, and didn't ease the implementation of > the new grammar in core Python because it used a different syntax of > grammar. > > Mobius2 exposes the Python parser and parser generator to Python code. > It means that you get all the speed of the builtin parser[1], its > reliability[2] and extensibility, and easy portability to core Python. > > If any types-sig proposals include language extensions, I suggest that you > take a look at Mobius2. > > http://sourceforge.net/projects/mobiuspython/ > > The portions of interest are the "mobius2" directory, and the demo/*2 > directories (currently just two simple uses of mobius2). > > Jeff > jepler@inetnebr.com > jepler@sourceforge.net > [1] Actually, it's about 15% faster, and I'm not sure why. > [2] A 159-case test suite covers all productions in the core Python 2.0 > language. All cases are verified to produce the same results (ASTs) > as the builtin parser, which isn't terribly surprising. > > _______________________________________________ > Types-SIG mailing list > Types-SIG@python.org > http://mail.python.org/mailman/listinfo/types-sig > From jepler@inetnebr.com Tue Apr 17 13:22:31 2001 From: jepler@inetnebr.com (Jeff Epler) Date: Tue, 17 Apr 2001 07:22:31 -0500 Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python parser In-Reply-To: References: <20010411220212.A3376@potty.housenet> Message-ID: <20010417072231.A1346@potty.housenet> On Mon, Apr 16, 2001 at 07:58:39PM -0700, Michel Pelletier wrote: > mycodegen generates the actual bytecode. I'm not sure how to get this > thing to say 'instanciate one of these things defined in a python module'. Michael, The organization of the demos is somewhat rough, as you can see. The basic organization is to associate one "my*" file with one step of the compilation process. The process looks like this: source code -> (tokenization, parsing) -> AST This step is modified by using a different Grammar file AST -> Nodes This step is modified by adding new nodes to 'mynodes.py' *and* visiting new AST subtrees in 'myxform.py' Nodes -> Bytecode This step is modified by handling new nodes in 'mycodegen.py' The AST -> Nodes -> Bytecode steps are preexisting in Python 2.0's Tools/compiler, though I may have chosen a clumsy way to extend them. (In particular, things are fragile, because you must magically make your new Grammar have the same symbol numbers as the builtin grammar for all nonterminals which exist in both, and then get the 'addsym' statements in 'mynodes' in the right order. I'm trying to cook up a good way to address this in the future) Anyhow, as to how to add code to some special spots in your new grammar ... Take a look at demo/html2/compile_template.py (originally snatched from Quixote). At each file_input (the top node in a module), it adds the equivalent of 'from IO_MODULE import IO_CLASS', and at the top of each function it instantiates an IO_CLASS. html2 adds some nodes for the desired code within the transformer, and then lets the code generation module generate the associated code. Jeff jepler@inetnebr.com def file_input(self, nodelist): # Add a "from IO_MODULE import IO_CLASS" statement to the # beginning of the module. doc = self.get_docstring(nodelist, symbol.file_input) imp = Node('from', IO_MODULE, [(IO_CLASS, None)]) # Add an IO_INSTANCE binding for module level expressions (like # doc strings). This instance will not be returned. klass = Node('name', IO_CLASS) instance = Node('call_func', klass, []) assign_name = Node('ass_name', IO_INSTANCE, OP_ASSIGN) assign = Node('assign', [assign_name], instance) stmts = [ imp, assign ] for node in nodelist: if node[0] != token.ENDMARKER and node[0] != token.NEWLINE: self.com_append_stmt(stmts, node) return Node('module', doc, Node('stmt', stmts)) def funcdef(self, nodelist): # ... # create an instance, assign to IO_INSTANCE klass = Node('name', IO_CLASS) instance = Node('call_func', klass, []) assign_name = Node('ass_name', IO_INSTANCE, OP_ASSIGN) assign = Node('assign', [assign_name], instance) From michel@digicool.com Tue Apr 17 17:42:05 2001 From: michel@digicool.com (Michel Pelletier) Date: Tue, 17 Apr 2001 09:42:05 -0700 (PDT) Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python parser In-Reply-To: <20010417072231.A1346@potty.housenet> Message-ID: On Tue, 17 Apr 2001, Jeff Epler wrote: > On Mon, Apr 16, 2001 at 07:58:39PM -0700, Michel Pelletier wrote: > > mycodegen generates the actual bytecode. I'm not sure how to get this > > thing to say 'instanciate one of these things defined in a python module'. > > Michael, > > The organization of the demos is somewhat rough, as you can see. The > basic organization is to associate one "my*" file with one step of the > compilation process. > > The process looks like this: > > source code -> (tokenization, parsing) -> AST > This step is modified by using a different Grammar file > AST -> Nodes > This step is modified by adding new nodes to 'mynodes.py' > *and* visiting new AST subtrees in 'myxform.py' > Nodes -> Bytecode > This step is modified by handling new nodes in 'mycodegen.py' Ok, I think I have that so far. > The AST -> Nodes -> Bytecode steps are preexisting in Python 2.0's > Tools/compiler, though I may have chosen a clumsy way to extend them. > (In particular, things are fragile, because you must magically make your > new Grammar have the same symbol numbers as the builtin grammar for all > nonterminals which exist in both, and then get the 'addsym' statements in > 'mynodes' in the right order. I'm trying to cook up a good way to address > this in the future) Yeah I ran into this one when I shuffled some of the symbols around in the Grammar file, chaos ensued. > Anyhow, as to how to add code to some special spots in your new grammar ... > > Take a look at demo/html2/compile_template.py (originally snatched from > Quixote). At each file_input (the top node in a module), it adds the > equivalent of 'from IO_MODULE import IO_CLASS', and at the top of each > function it instantiates an IO_CLASS. Ok, I think I get it. I am still having a problem though that I can't get past. Here's snippet of my Grammar file: interfacedef: 'interface' NAME ['(' testlist ')'] ':' suite This is the last statement in the file and it is essetially identical to a classdef. Here's my module: class MyModule(MobiusModule): cmagic = MAGIC transformer = MyTransformer cgen = MyCodeGenerator MyTransformer just defines a grammar attribute and an 'interfacedef' method that just does the same thing as 'classdef'. Changing my Grammar file clearly changes the behavior of my programs, so I know I'm using the right file. Here's my test suite: interface Hello: """ Say hello. """ def hello(name): """ Say hello to 'name'. """ class MyHello: def hello(self, name): return "hello %s" % name x = MyHello() print x.hello('world!') But everytime I try to run this script, I get a Syntax Error: [michel@heinlein interfaces]$ python2.0 pyinterfaces.py Traceback (most recent call last): File "pyinterfaces.py", line 35, in ? if __name__ == '__main__': import test File "/usr/local/lib/python2.0/ihooks.py", line 396, in import_module q, tail = self.find_head_package(parent, name) File "/usr/local/lib/python2.0/ihooks.py", line 432, in find_head_package q = self.import_it(head, qname, parent) File "/usr/local/lib/python2.0/ihooks.py", line 485, in import_it m = self.loader.load_module(fqname, stuff) File "mobius2/imphooks.py", line 119, in load_module return self.loader.load_compiled(name, filename, file) File "mobius2/imphooks.py", line 55, in load_compiled return self.load_source(name, source_filename) File "mobius2/imphooks.py", line 79, in load_source code = self.compiler(file, filename, output) File "mobius2/mcompile.py", line 66, in __call__ template.compile() File "mobius2/mcompile.py", line 43, in compile ast = self.parse(self.source, self.filename) File "mobius2/mcompile.py", line 40, in parse return self.transformer().parsesuite(source) File "mobius2/mcompile.py", line 26, in parsesuite ast = self.grammar.suite(source + "\n", 1) File "mobius2/Parser.py", line 26, in suite return self.parse(code, lineno) File "mobius2/Parser.py", line 23, in parse return self.grammar.parse(code, start_code, lineno) File "", line 1 interface Hello: ^ SyntaxError: invalid syntax I'm not sure why I get it, but the error is being raised in the pgen.c code somewhere, so I can't follow it any deeper. Now I'm stuck. ;( I must be missing something, -Michel From jepler@users.sourceforge.net Tue Apr 17 19:25:42 2001 From: jepler@users.sourceforge.net (Jeff Epler) Date: Tue, 17 Apr 2001 13:25:42 -0500 Subject: [Types-sig] "Mobius2" -- high performance Python-extensible Python parser In-Reply-To: ; from michel@digicool.com on Tue, Apr 17, 2001 at 09:42:05AM -0700 References: <20010417072231.A1346@potty.housenet> Message-ID: <20010417132540.A26139@inetnebr.com> On Tue, Apr 17, 2001 at 09:42:05AM -0700, Michel Pelletier wrote: > I am still having a problem though that I can't get past. Here's snippet > of my Grammar file: > > interfacedef: 'interface' NAME ['(' testlist ')'] ':' suite You don't say whether you did this, but you need to amend the rule for 'compound_stmt': compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | funcdef | classdef | interfacedef Your 'myxform' file also needs to reference the grammar file: class MyTransformer(mobius2.mcompile.MobiusTransformerMixin, compile_template.TemplateTransformer): grammar = mobius2.Parser.Parser("Grammar") # ... You can try executing in immediate mode: import mobius2.Parser g = mobius2.Parser.Parser("Grammar") g.suite(open("interfacetest.py").read()) this should return a large nested tuple if the grammar is working. Jeff From thomas.heller@ion-tof.com Wed Apr 18 17:39:28 2001 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Wed, 18 Apr 2001 18:39:28 +0200 Subject: [Types-sig] Interesting reading Message-ID: <04a101c0c826$230645e0$e000a8c0@thomasnotebook> I found it interesting that the python object model is very similar to what Axel Tobias Schreiner describes in his book 'Objekt-orientierte Programmierung mit ANSI-C'. An english translation is available online: http://www.informatik.uni-osnabrueck.de/axel/books/ooc.pdf Regards, Thomas