From jeremy@alum.mit.edu Wed Apr 10 05:04:08 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 10 Apr 2002 00:04:08 -0400 (EDT) Subject: [Compiler-sig] progress on new AST Message-ID: <0GUC009TO3AVOB@mtaout03.icomcast.net> I've been working on a new AST defined in ASDL (the Zephyr abstract syntax definition language). I've checked in the current work in python/nondist/sandbox/ast. There is a python.asdl that defines an AST that is reasonably complete, although it has rough edges (slices, etc.). I've also written a simple C code generator that turns the ast definition into C code that defines structs and constructor functions. I think the next step is to work on a transformer that translates that concrete syntax into the AST. I'd also like to write a code to generate ASDL pickles, but that's a lower priority. Note that while there is a next step, I don't think any of the earlier steps is done. I expect the AST will change several times before we're done. I expect the C representation will also change. I belive, for example, that Guido prefers something more like the current representation which uses a variable length array of child pointers. And the NCH(), CHILD(), and REQ() macros. I've been trying to avoid this style, because it leaves the author stuck with a bunch of small integers in place of names. Anyone interested in pitching in? I'd be happy to have feedback or help. Jeremy From skip@pobox.com Wed Apr 10 05:23:36 2002 From: skip@pobox.com (Skip Montanaro) Date: Tue, 9 Apr 2002 23:23:36 -0500 Subject: [Compiler-sig] progress on new AST In-Reply-To: <0GUC009TO3AVOB@mtaout03.icomcast.net> References: <0GUC009TO3AVOB@mtaout03.icomcast.net> Message-ID: <15539.48712.780052.112027@12-248-41-177.client.attbi.com> Jeremy> I've been working on a new AST defined in ASDL (the Zephyr Jeremy> abstract syntax definition language). I've checked in the Jeremy> current work in python/nondist/sandbox/ast. ... Jeremy> Anyone interested in pitching in? I'd be happy to have feedback Jeremy> or help. What needs to be pitched? I'm generally more familiar with pitching stuff out, but not in a software setting. Unfamiliar as I am with where this is headed, I will abstract a post to c.l.py from a couple days ago that has so far gone unanswered and ask a question: How difficult is it to change the parser that the following: ==guettli@sonne:~/tmp$ python ~/scripts/replace_recursive.py File "/home/guettli/scripts/replace_recursive.py", line 17 in=open(temp) ^ SyntaxError: invalid syntax will print "SyntaxError: invalid syntax. 'in' is an reserved word"? Will the new ASDL code eventually lead to more user-friendly error messages and decent enough error recovery that it won't have to give up after the first syntax error it encounters? Skip From jeremy@zope.com Wed Apr 10 13:55:26 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Wed, 10 Apr 2002 08:55:26 -0400 Subject: [Compiler-sig] progress on new AST In-Reply-To: <15539.48712.780052.112027@12-248-41-177.client.attbi.com> References: <0GUC009TO3AVOB@mtaout03.icomcast.net> <15539.48712.780052.112027@12-248-41-177.client.attbi.com> Message-ID: <15540.13886.642125.41974@slothrop.zope.com> >>>>> "SM" == Skip Montanaro writes: SM> What needs to be pitched? I'm generally more familiar with SM> pitching stuff out, but not in a software setting. Reviewing the AST to make sure it accurately describes Python. Once I get started on the transformer, there will be lots of code to write. I don't know how easy it will be to split that task up. SM> Will the new ASDL code eventually lead to more user-friendly SM> error messages and decent enough error recovery that it won't SM> have to give up after the first syntax error it encounters? Unfortunately, no. The ASDL stuff describes the AST -- a compiler intermediate representation. The error recovery needs to be added to the parser. Jeremy From skip@pobox.com Wed Apr 10 15:40:16 2002 From: skip@pobox.com (Skip Montanaro) Date: Wed, 10 Apr 2002 09:40:16 -0500 Subject: [Compiler-sig] progress on new AST In-Reply-To: <15540.13886.642125.41974@slothrop.zope.com> References: <0GUC009TO3AVOB@mtaout03.icomcast.net> <15539.48712.780052.112027@12-248-41-177.client.attbi.com> <15540.13886.642125.41974@slothrop.zope.com> Message-ID: <15540.20176.479968.66870@12-248-41-177.client.attbi.com> SM> What needs to be pitched? I'm generally more familiar with pitching SM> stuff out, but not in a software setting. Jeremy> Reviewing the AST to make sure it accurately describes Python. Jeremy> Once I get started on the transformer, there will be lots of Jeremy> code to write. I don't know how easy it will be to split that Jeremy> task up. Reading? I think I can read. SM> Will the new ASDL code eventually lead to more user-friendly error SM> messages and decent enough error recovery that it won't have to give SM> up after the first syntax error it encounters? Jeremy> Unfortunately, no. The ASDL stuff describes the AST -- a Jeremy> compiler intermediate representation. The error recovery needs Jeremy> to be added to the parser. My mistake. I was looking at the checkin messages and thinking I was looking at grammar changes. Skip From bckfnn@worldonline.dk Wed Apr 10 18:05:25 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Wed, 10 Apr 2002 17:05:25 GMT Subject: [Compiler-sig] progress on new AST In-Reply-To: <0GUC009TO3AVOB@mtaout03.icomcast.net> References: <0GUC009TO3AVOB@mtaout03.icomcast.net> Message-ID: <3cb46b73.25619278@mail.wanadoo.dk> [Jeremy Hylton] >I've been working on a new AST defined in ASDL (the Zephyr abstract >syntax definition language). I've checked in the current work in >python/nondist/sandbox/ast. Thanks. >There is a python.asdl that defines an AST that is reasonably >complete, although it has rough edges (slices, etc.). Keep in mind that I'm a newbie at reading asdl, but how is it expressed that a 'Module' contain a list of 'stmts', while a FunctionDef only contain one 'name'? >I've also >written a simple C code generator that turns the ast definition into C >code that defines structs and constructor functions. I'm playing around with generating java code and all the needed information seems to be available, but I can't quite make sense of the basic idea behind the datastructures we are generating from. What is a Sum and what is a Product in this sense? regards, finn From jeremy@zope.com Wed Apr 10 23:55:18 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Wed, 10 Apr 2002 18:55:18 -0400 Subject: [Compiler-sig] progress on new AST In-Reply-To: <3cb46b73.25619278@mail.wanadoo.dk> References: <0GUC009TO3AVOB@mtaout03.icomcast.net> <3cb46b73.25619278@mail.wanadoo.dk> Message-ID: <15540.49878.108194.364668@slothrop.zope.com> >>>>> "FB" == Finn Bock writes: FB> [Jeremy Hylton] >> There is a python.asdl that defines an AST that is reasonably >> complete, although it has rough edges (slices, etc.). FB> Keep in mind that I'm a newbie at reading asdl, I'd recommend you read Dan Wang's DSL 97 paper: http://www.cs.princeton.edu/~danwang/Papers/dsl97/dsl97-abstract.html. It's an easy read. It describes the ASDL syntax and shows small examples of an AST and code generated for C and Java. FB> Keep in mind that I'm a newbie at reading asdl, but how is it FB> expressed that a 'Module' contain a list of 'stmts', while a FB> FunctionDef only contain one 'name'? Your question pointed out an embarassing bug in the python.asdl file :-). If we take an example "constructor" (with fix applied): stmt = ClassDef(identifier name, expr* bases, stmt* body) The lhs is the name of the type, the rhs is a constructor signature. The constructor takes three arguments. The type is on the left, the name is on the right. identifier is a builtin type. expr and stmt are defined in python.asdl. There are two type modifiers * and ?. The * means sequence of 0 or more. The ? means optional. So a class has a single name, an arbitrary number of base class expressions, and an arbitrary number of stmts. The bug is that Module, FunctionDef, and ClassDef were define to contain a single statement. I'm sure that's what confused you. >> I've also written a simple C code generator that turns the ast >> definition into C code that defines structs and constructor >> functions. FB> I'm playing around with generating java code and all the needed FB> information seems to be available, but I can't quite make sense FB> of the basic idea behind the datastructures we are generating FB> from. What is a Sum and what is a Product in this sense? A Sum is a set of type constructors -- so stmt is a sum type. A Product is like listcomp -- a single unnamed constructor. For a sum type, a value can be any one of the constructors. For a product, there is only one constructor. The DSL paper represents a sum as a C union with a struct element for each constructor. It is silent on products, but I've chosen to represent it as a single struct. Feel free to check in any Java-generating code in the sandbox. Jeremy From bckfnn@worldonline.dk Thu Apr 11 13:38:13 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Thu, 11 Apr 2002 12:38:13 GMT Subject: [Compiler-sig] progress on new AST In-Reply-To: <15540.49878.108194.364668@slothrop.zope.com> References: <0GUC009TO3AVOB@mtaout03.icomcast.net> <3cb46b73.25619278@mail.wanadoo.dk> <15540.49878.108194.364668@slothrop.zope.com> Message-ID: <3cb56519.1150524@mail.wanadoo.dk> [Jeremy] >I'd recommend you read Dan Wang's DSL 97 paper: >http://www.cs.princeton.edu/~danwang/Papers/dsl97/dsl97-abstract.html. >It's an easy read. It describes the ASDL syntax and shows small >examples of an AST and code generated for C and Java. Thanks. > > FB> Keep in mind that I'm a newbie at reading asdl, but how is it > FB> expressed that a 'Module' contain a list of 'stmts', while a > FB> FunctionDef only contain one 'name'? > >Your question pointed out an embarassing bug in the python.asdl file >:-). If we take an example "constructor" (with fix applied): > > stmt = ClassDef(identifier name, expr* bases, stmt* body) > >The lhs is the name of the type, the rhs is a constructor signature. >The constructor takes three arguments. The type is on the left, the >name is on the right. identifier is a builtin type. expr and stmt >are defined in python.asdl. There are two type modifiers * and ?. >The * means sequence of 0 or more. The ? means optional. > >So a class has a single name, an arbitrary number of base class >expressions, and an arbitrary number of stmts. > >The bug is that Module, FunctionDef, and ClassDef were define to >contain a single statement. I'm sure that's what confused you. Indeed, I couldn't quite make it add up. I then guess the same problem still exists for the remaining uses of 'stmt' in For, While, If, TryExcept and TryFinally? Will the optional 'else:' part of For, While and If be handled as a zero length list stmt's? Or maybe the optional '?' operator can be used for an optional sequence? > >> I've also written a simple C code generator that turns the ast > >> definition into C code that defines structs and constructor > >> functions. > > FB> I'm playing around with generating java code and all the needed > FB> information seems to be available, but I can't quite make sense > FB> of the basic idea behind the datastructures we are generating > FB> from. What is a Sum and what is a Product in this sense? > >A Sum is a set of type constructors -- so stmt is a sum type. A >Product is like listcomp -- a single unnamed constructor. For a sum >type, a value can be any one of the constructors. For a product, >there is only one constructor. Thanks, that helped. >The DSL paper represents a sum as a C union with a struct element for >each constructor. It is silent on products, but I've chosen to >represent it as a single struct. >Feel free to check in any Java-generating code in the sandbox. Will do, eventually. There are some restrictions on the java code, typically naming that I have to deal with somehow and i'm not sure what can be changed in the .asdl and what must be handled in my generator. - Would it be OK to rename the 'final' arg in TryFinally to f.ex 'finalbody'? 'final' is a java reserved word. - Would it be OK to change the name of 'String' and 'Number'? Java classes with these names already exists in the java.lang package and it is annoying to work with userclasses with these names. For example: Index: python.asdl =================================================================== RCS file: /cvsroot/python/python/nondist/sandbox/ast/python.asdl,v retrieving revision 1.8 diff -u -r1.8 python.asdl --- python.asdl 10 Apr 2002 23:03:32 -0000 1.8 +++ python.asdl 11 Apr 2002 12:34:31 -0000 @@ -24,7 +24,7 @@ -- 'type' is a bad name | Raise(expr? type, expr? inst, expr? tback) | TryExcept(stmt body, except* handlers) - | TryFinally(stmt body, stmt final) + | TryFinally(stmt body, stmt finalbody) | Assert(expr test, expr? msg) -- may want to factor this differently perhaps excluding @@ -59,8 +59,8 @@ expr? starargs, expr? kwargs) | Repr(expr value) | Lvalue(assign lvalue) - | Number(string n) -- string representation of a number - | String(string s) -- need to specify raw, unicode, etc? + | Num(string n) -- string representation of a number + | Str(string s) -- need to specify raw, unicode, etc? -- other literals? bools? -- the subset of expressions that are valid as the target of regards, finn From bckfnn@worldonline.dk Thu Apr 11 20:53:54 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Thu, 11 Apr 2002 19:53:54 GMT Subject: [Compiler-sig] progress on new AST In-Reply-To: <0GUC009TO3AVOB@mtaout03.icomcast.net> References: <0GUC009TO3AVOB@mtaout03.icomcast.net> Message-ID: <3cb5e798.34557581@mail.wanadoo.dk> [Jerymy] >I've been working on a new AST defined in ASDL ... Another question: Whats with the Lvalue constructor? It strikes me as a somewhat strange way of describing a 'expr'. Is it just a consequence of some asdl limitation? Are there a reason for having a Lvalue node in the AST? regards, finn From jeremy@zope.com Thu Apr 11 21:15:51 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Thu, 11 Apr 2002 16:15:51 -0400 Subject: [Compiler-sig] progress on new AST In-Reply-To: <3cb5e798.34557581@mail.wanadoo.dk> References: <0GUC009TO3AVOB@mtaout03.icomcast.net> <3cb5e798.34557581@mail.wanadoo.dk> Message-ID: <15541.61175.477803.278880@slothrop.zope.com> >>>>> "FB" == Finn Bock writes: FB> [Jerymy] >> I've been working on a new AST defined in ASDL ... FB> Another question: Whats with the Lvalue constructor? It strikes FB> me as a somewhat strange way of describing a 'expr'. Is it just FB> a consequence of some asdl limitation? Are there a reason for FB> having a Lvalue node in the AST? The Lvalue node captures the notion that a limited subset of expressions can occur in two contexts -- as an expression or as the target of an assignment. A single constructor can appear only once; otherwise its type would be ambiguous. Example: Name() can be an expression and the target of an assignment. A ListComp() is an expression, but can not be assigned to. The extra Lvalue() constructor captures the distinction. Jeremy From jeremy@zope.com Thu Apr 11 21:41:38 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Thu, 11 Apr 2002 16:41:38 -0400 Subject: [Compiler-sig] progress on new AST In-Reply-To: <3cb56519.1150524@mail.wanadoo.dk> References: <0GUC009TO3AVOB@mtaout03.icomcast.net> <3cb46b73.25619278@mail.wanadoo.dk> <15540.49878.108194.364668@slothrop.zope.com> <3cb56519.1150524@mail.wanadoo.dk> Message-ID: <15541.62722.977090.518394@slothrop.zope.com> The suggested changes look fine. I'll add them today. Jeremy From bckfnn@worldonline.dk Sat Apr 13 14:01:58 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Sat, 13 Apr 2002 13:01:58 GMT Subject: [Compiler-sig] Lvalue In-Reply-To: <15541.61175.477803.278880@slothrop.zope.com> References: <0GUC009TO3AVOB@mtaout03.icomcast.net> <3cb5e798.34557581@mail.wanadoo.dk> <15541.61175.477803.278880@slothrop.zope.com> Message-ID: <3cb803f6.4679168@mail.wanadoo.dk> [Jeremy] >The Lvalue node captures the notion that a limited subset of >expressions can occur in two contexts -- as an expression or as the >target of an assignment. Ok, but is it important to express that notion in the AST typesystem? I'm sure you have given this more though than I have, but I tend to disagree. Maybe it is just that I loath to see AST like: : Expr[value=Lvalue[lvalue=Attribute[value=Lvalue[lvalue=Name[id=A]], attr=b]]] to capture the expression "A.b". I would prefer an asdl without the "assign" type and instead: | Del(expr* targets) | Assign(expr* targets, expr value) | AugAssign(expr target, operator op, expr value) Yes, I know that would allow a user to manually create an semanticly incorrect AST tree, but IMO that is what TypeErrors are good at expressing. Just my 2 cent. regards, finn From bckfnn@worldonline.dk Sat Apr 13 14:02:40 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Sat, 13 Apr 2002 13:02:40 GMT Subject: [Compiler-sig] if .. elif: In-Reply-To: <0GUC009TO3AVOB@mtaout03.icomcast.net> References: <0GUC009TO3AVOB@mtaout03.icomcast.net> Message-ID: <3cb829ba.14347340@mail.wanadoo.dk> Hi Maybe I'm missing something, but I think the If() constructor is a bit too simple to handle a list of 'elif:' parts. If(expr test, stmt* body, stmt* orelse) This here is my take on a solution: Index: python.asdl =================================================================== RCS file: /cvsroot/python/python/nondist/sandbox/ast/python.asdl,v retrieving revision 1.9 diff -w -u -r1.9 python.asdl --- python.asdl 11 Apr 2002 21:20:19 -0000 1.9 +++ python.asdl 13 Apr 2002 12:55:43 -0000 @@ -19,7 +19,7 @@ -- need a better solution for that | For(expr target, expr iter, stmt* body, stmt* orelse) | While(expr test, stmt* body, stmt* orelse) - | If(expr test, stmt* body, stmt* orelse) + | If(ifpart* tests, stmt* orelse) -- 'type' is a bad name | Raise(expr? type, expr? inst, expr? tback) @@ -96,4 +96,6 @@ -- keyword arguments supplied to call keyword = (identifier arg, expr value) + + ifpart = (expr test, stmt* body) } regards, finn From bckfnn@worldonline.dk Sat Apr 13 17:35:59 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Sat, 13 Apr 2002 16:35:59 GMT Subject: [Compiler-sig] Handler suite in except. Message-ID: <3cb85bc7.27160504@mail.wanadoo.dk> Hi, Another rough edge: I can't see where the except: codeblock should go. I'm guessing that 'except' should read: except = (expr type, identifier? name, stmt* body) ? regards, finn From bckfnn@worldonline.dk Sat Apr 13 22:13:43 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Sat, 13 Apr 2002 21:13:43 GMT Subject: [Compiler-sig] Import and ImportFrom Message-ID: <3cb89c85.43734336@mail.wanadoo.dk> Hi, I think a refactoring of Import is required in order to support a list of aliases: from p import a, b as c, d as e import a, b as c, d as e How about this definition: | Import(alias* names) | ImportFrom(identifier module, alias* names) alias = (identifier name, identifier? asname) ? regards, finn From jeremy@zope.com Sun Apr 14 05:04:33 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Sun, 14 Apr 2002 00:04:33 -0400 Subject: [Compiler-sig] Re: Lvalue In-Reply-To: <3cb803f6.4679168@mail.wanadoo.dk> References: <0GUC009TO3AVOB@mtaout03.icomcast.net> <3cb5e798.34557581@mail.wanadoo.dk> <15541.61175.477803.278880@slothrop.zope.com> <3cb803f6.4679168@mail.wanadoo.dk> Message-ID: <15544.65489.151544.482696@slothrop.zope.com> >>>>> "FB" == Finn Bock writes: FB> [Jeremy] >> The Lvalue node captures the notion that a limited subset of >> expressions can occur in two contexts -- as an expression or as >> the target of an assignment. FB> Ok, but is it important to express that notion in the AST FB> typesystem? I'm sure you have given this more though than I FB> have, but I tend to disagree. Maybe it is just that I loath to FB> see AST like: FB> : FB> Expr[value=Lvalue[lvalue=Attribute[value=Lvalue[lvalue=Name[id=A]], FB> attr=b]]] FB> to capture the expression "A.b". I was (am?) undecided about whether the typesystem should express the limitation on expressions that are targets of assignments. The example above seems a strong argument against an explicit lvalue type. I wonder, though, if we might have a separate set of constructors for the assign type. Perhaps instead of LValue we have AssignAttribute, AssignName, etc. The AST from the compiler package in the std library uses this approach, although I don't like the names it uses. (I don't like the names I just used either.) It seems useful to distinguish the two cases, because they are handled differently inside the compiler. The bytecode generated is different, and there needs to be some way for the code generator to distinguish the cases. Using LValue or just Expr means that the code generator needs to track context explicitly to see if the LValue is being used as an expression or an assignment. If the constructor is different for the two cases, there is no need to track the context separately. Jeremy From jeremy@zope.com Sun Apr 14 05:09:53 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Sun, 14 Apr 2002 00:09:53 -0400 Subject: [Compiler-sig] Re: if .. elif: In-Reply-To: <3cb829ba.14347340@mail.wanadoo.dk> References: <0GUC009TO3AVOB@mtaout03.icomcast.net> <3cb829ba.14347340@mail.wanadoo.dk> Message-ID: <15545.273.951129.448987@slothrop.zope.com> >>>>> "FB" == Finn Bock writes: FB> Hi Maybe I'm missing something, but I think the If() constructor FB> is a bit too simple to handle a list of 'elif:' parts. FB> If(expr test, stmt* body, stmt* orelse) I was expecting to encode 'elif' parts as a series of new If() constructors in the orelse slot. if x == 1: print 1 elif x == 2: print 2 else: print 3 If(Compare(Lvalue(Name(x)), Num("1")), [Print(NULL, [Num("1")], False)], [If(Compare(Lvalue(Name(x)), Num("2")), [Print(NULL, [Num("2")], False)], [Print(NULL, [Num("3")], False)])]) Does that make sense? (And, yuck, the Lvalue is a pain.) Jeremy From bckfnn@worldonline.dk Sun Apr 14 11:10:15 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Sun, 14 Apr 2002 10:10:15 GMT Subject: [Compiler-sig] Changes to TyrExcept Message-ID: <3cb953a5.3802147@mail.wanadoo.dk> Hi, I have checked in a fix to TryExcept and except. If I have misunderstood how it should be used, please scold me gently. regards, finn Index: python.asdl =================================================================== RCS file: /cvsroot/python/python/nondist/sandbox/ast/python.asdl,v retrieving revision 1.10 diff -w -u -r1.10 python.asdl --- python.asdl 14 Apr 2002 09:23:01 -0000 1.10 +++ python.asdl 14 Apr 2002 10:01:10 -0000 @@ -23,7 +23,7 @@ -- 'type' is a bad name | Raise(expr? type, expr? inst, expr? tback) - | TryExcept(stmt* body, except* handlers) + | TryExcept(stmt* body, except* handlers, stmt* orelse) | TryFinally(stmt* body, stmt* finalbody) | Assert(expr test, expr? msg) @@ -82,7 +82,7 @@ -- not sure what to call the first argument for raise and except - except = (expr type, identifier? name) + except = (expr? type, assign? name, stmt* body) -- XXX need to handle 'def f((a, b)):' arguments = (identifier* args, identifier? vararg, From bckfnn@worldonline.dk Sun Apr 14 14:47:32 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Sun, 14 Apr 2002 13:47:32 GMT Subject: [Compiler-sig] funcdef parameters Message-ID: <3cb96341.7797742@mail.wanadoo.dk> Hi, I'm trying to understand the 'arguments' production. arguments = (identifier* args, identifier? vararg, identifier? kwarg, expr* defaults) First, I'll ignore the posibility of tuple parameter. I'm guessing that 'vararg' is what is called 'starargs' in the Call ctor. I'm also guessing that 'defaults' contain the keyword values like this: def foo(a, b, c=1, d=2, *lst, *kw): pass -> arguments([a, b, c, d], lst, kw, [1, 2]) or maybe? arguments([a, b, c, d], lst, kw, [None, None, 1, 2]) Second, we have to add tuple parameters and the simplest way I can see looks like this. @@ -85,8 +85,11 @@ except = (expr? type, assign? name, stmt* body) -- XXX need to handle 'def f((a, b)):' - arguments = (identifier* args, identifier? vararg, + arguments = (fpdef* args, identifier? vararg, identifier? kwarg, expr* defaults) + + fpdef = FpList(fpdef* list) + | FpName(identifier id) -- keyword arguments supplied to call keyword = (identifier arg, expr value) regards, finn From bckfnn@worldonline.dk Sun Apr 14 14:49:13 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Sun, 14 Apr 2002 13:49:13 GMT Subject: [Compiler-sig] Re: if .. elif: In-Reply-To: <15545.273.951129.448987@slothrop.zope.com> References: <0GUC009TO3AVOB@mtaout03.icomcast.net> <3cb829ba.14347340@mail.wanadoo.dk> <15545.273.951129.448987@slothrop.zope.com> Message-ID: <3cb98861.17302519@mail.wanadoo.dk> [Jeremy] >I was expecting to encode 'elif' parts as a series of new If() >constructors in the orelse slot. > >if x == 1: > print 1 >elif x == 2: > print 2 >else: > print 3 > >If(Compare(Lvalue(Name(x)), Num("1")), > [Print(NULL, [Num("1")], False)], > [If(Compare(Lvalue(Name(x)), Num("2")), > [Print(NULL, [Num("2")], False)], > [Print(NULL, [Num("3")], False)])]) > >Does that make sense? I guess it will work, but I don't like it much. It doesn't feel honest to the actual python syntax. regards, finn From bckfnn@worldonline.dk Sun Apr 14 16:23:57 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Sun, 14 Apr 2002 15:23:57 GMT Subject: [Compiler-sig] Question about slice Message-ID: <3cb98fb0.19172778@mail.wanadoo.dk> Hi, Yet another question, this time about the slice ctors and how they should be used. I made a little change locally: - | ExtSlice(expr* dims) + | ExtSlice(slice* dims) and I have used the Slice and ExtSlice like this (only showing the actual slice): L[1] --> Slice[lower=Num[n=1], upper=null] L[1:2] --> Slice[lower=Num[n=1], upper=Num[n=2]] L[1:2, 3] --> ExtSlice[dims=[ Slice[lower=Num[n=1], upper=Num[n=2]], Slice[lower=Num[n=3], upper=null] ]] Is that about right? It seem to work OK. Except that jython happens to support a step argument to its slicesyntax. Jython 2.1+ on java1.4.0-beta3 (JIT: null) Type "copyright", "credits" or "license" for more information. >>> "1234567890"[::2] '13579' >>> How do you feel about adding a step argument to the Slice ctor? regards, finn From bckfnn@worldonline.dk Sun Apr 14 16:43:19 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Sun, 14 Apr 2002 15:43:19 GMT Subject: [Compiler-sig] Jython progres Message-ID: <3cb99f68.23197065@mail.wanadoo.dk> Hi, I can now transform all the standard Lib .py files from CPython with the AST tree builder I have made for jython. I have used a slightly modified python.asdl, but most of the changes have been discussed here previously. A cute little detail about my approach is that I have no intermediate parse tree structure. Instead I can create the AST nodes directly. Obviously I can't generate java bytecode from the AST yet, that is next phase that I will work on (and I'm sure that bugs will surface when I start). Below are the changes I made. I can work without the 'ifpart' change, but the rest should IMO be committed in some form. regards, finn Index: python.asdl =================================================================== RCS file: /cvsroot/python/python/nondist/sandbox/ast/python.asdl,v retrieving revision 1.11 diff -w -u -r1.11 python.asdl --- python.asdl 14 Apr 2002 10:10:02 -0000 1.11 +++ python.asdl 14 Apr 2002 15:28:07 -0000 @@ -6,7 +6,7 @@ stmt = FunctionDef(identifier name, arguments args, stmt* body) | ClassDef(identifier name, expr* bases, stmt* body) - | Return(expr value) | Yield(expr value) + | Return(expr? value) | Yield(expr value) | Del(assign* targets) | Assign(assign* targets, expr value) @@ -19,7 +19,7 @@ -- need a better solution for that | For(expr target, expr iter, stmt* body, stmt* orelse) | While(expr test, stmt* body, stmt* orelse) - | If(expr test, stmt* body, stmt* orelse) + | If(ifpart* tests, stmt* orelse) -- 'type' is a bad name | Raise(expr? type, expr? inst, expr? tback) @@ -66,7 +66,7 @@ slice = Ellipsis | Slice(expr? lower, expr? upper) -- maybe Slice and ExtSlice should be merged... - | ExtSlice(expr* dims) + | ExtSlice(slice* dims) boolop = And | Or @@ -85,11 +85,17 @@ except = (expr? type, assign? name, stmt* body) -- XXX need to handle 'def f((a, b)):' - arguments = (identifier* args, identifier? vararg, + arguments = (fpdef* args, identifier? vararg, identifier? kwarg, expr* defaults) + fpdef = FpList(fpdef* list) + | FpName(identifier id) + -- keyword arguments supplied to call keyword = (identifier arg, expr value) + + ifpart = (expr test, stmt* body) + -- import name with optional 'as' alias. alias = (identifier name, identifier? asname) From bckfnn@worldonline.dk Sun Apr 14 21:45:50 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Sun, 14 Apr 2002 20:45:50 GMT Subject: [Compiler-sig] Visitor pattern Message-ID: <3cb9e7dc.41745046@mail.wanadoo.dk> Hi, Any thoughs about how a visitor pattern should be added to the AST nodes? I took a quick look at the visitor in the 'compiler' package but it wasn't immediately obvious to me how it works. My own thoughts goes like this (I'm clearly thinking in java here). In each AST node a method is generated: public Object accept(Visitor visitor) throws Exception { return visitor.visit_ClassDef(this); } and a Visitor interface is generated like this: public interface Visitor { public Object visit_Module(Module node) throws Exception; public Object visit_FunctionDef(FunctionDef node) throws Exception; public Object visit_ClassDef(ClassDef node) throws Exception; ... } As a result, the visitor implementation is itself responsible for calling the .accept() method on all its children and there is no default recursion. Something as simple as that would fulfill the needs for visitor patterns as used by jython itself, but if you are thinking about something more powerful I might as well use that in jython's CodeCompiler. regards, finn From jeremy@zope.com Mon Apr 15 01:50:26 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Sun, 14 Apr 2002 20:50:26 -0400 Subject: [Compiler-sig] Visitor pattern In-Reply-To: <3cb9e7dc.41745046@mail.wanadoo.dk> References: <3cb9e7dc.41745046@mail.wanadoo.dk> Message-ID: <15546.9170.111452.105217@slothrop.zope.com> I've been meaning to get the compiler package's visitor documented, but I'm not yet sure where to start. (That is, I'm not sure which things are novel or unusual and which are obvious.) I think there are three key features: - Embed the name of the class being visited in the visitor method. This wouldn't be necessary in Java, I assume, because each method would have the same name buts its argument list would be different. - The AST walker exploits knowledge of the AST structure to provide a default visit() method that will work on any node type. - Responsibility for implementing the pattern is divided between a visitor and a walker. The visitor implements a visitXXX() method for each node of interest. The walker takes a visitor instance and the root of a node graph. It applies the visitor methods to the node graph starting with the root. The second seems to apply regardless of language and can be quite convenient. If you're writing a simple symbol table visitor, you only care about a few of the node types. The If stmt, e.g., has no effect on the symbol table, only its children do. The default method makes it possible to write a visitor that only has methods for the nodes it cares about. If we have a full specification of the AST (we do: python.asdl) and we assume that the spec includes the children of each node in a "natural" order, then we can generate a visitor method automatically. By natural, I mean a pre-order traversal of the tree, which has always been what I've needed. In the compiler package, each node type has a method getChildren() that returns a tuple containing the children in the natural order. class If: def getChildren(self): return self.test, self.body, self.orelse We should be able to define these as well, since the python.asdl presents the children in the natural order. If we have a visitor that isn't interested in If nodes, then it simply doesn't define a visitIf() method. If the AST contains an If node, the default method of the walker handles it instead. The default method on the walker does the following: for child in node.getChildren(): if child is not None: self.visit(child) The visit() method is defined by the walker. It takes an arbitrary node as its argument, looks up the appropriate method on the visitor, and calls it. The visitor can also use this method to delegate responsibility for method lookup to the walker. Does that help? I'm not sure how much of this maps naturally from Python to Java. Jeremy From bckfnn@worldonline.dk Mon Apr 15 15:01:34 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Mon, 15 Apr 2002 14:01:34 GMT Subject: [Compiler-sig] Visitor pattern In-Reply-To: <15546.9170.111452.105217@slothrop.zope.com> References: <3cb9e7dc.41745046@mail.wanadoo.dk> <15546.9170.111452.105217@slothrop.zope.com> Message-ID: <3cbab5ce.6552301@mail.wanadoo.dk> [Jeremy] >I've been meaning to get the compiler package's visitor documented, >but I'm not yet sure where to start. (That is, I'm not sure which >things are novel or unusual and which are obvious.) > >I think there are three key features: > > - Embed the name of the class being visited in the visitor > method. This wouldn't be necessary in Java, I assume, because > each method would have the same name buts its argument list > would be different. Java can do that to some extend but lets ignore that and just decide that the visitor have method like visitAssign() and visitIf(). > > - The AST walker exploits knowledge of the AST structure to > provide a default visit() method that will work on any node > type. I think it is the walker idea that threw me off. When I hear 'visitor' together with trees I immediately thinks the visitor pattern in the GOF book, and there is no walker in that pattern. > - Responsibility for implementing the pattern is divided between a > visitor and a walker. And with no responsibility placed in the nodes (other then the getChildren() method), right ? That is OK, but it is a bit of a stretch to call it a visitor (IMHO). >The visitor implements a visitXXX() method for each node of interest. >The walker takes a visitor instance and the root of a node graph. It >applies the visitor methods to the node graph starting with the root. > >The second seems to apply regardless of language and can be quite >convenient. If you're writing a simple symbol table visitor, you only >care about a few of the node types. The If stmt, e.g., has no effect >on the symbol table, only its children do. The default method makes >it possible to write a visitor that only has methods for the nodes it >cares about. > >If we have a full specification of the AST (we do: python.asdl) and we >assume that the spec includes the children of each node in a "natural" >order, then we can generate a visitor method automatically. By >natural, I mean a pre-order traversal of the tree, which has always >been what I've needed. That is interesting, I can think of only one times where I would need a autotraversel: When detecting fastlocals. When generating code I need to control the traversal order myself. >In the compiler package, each node type has a method getChildren() >that returns a tuple containing the children in the natural order. > >class If: > > def getChildren(self): > return self.test, self.body, self.orelse Yuck! For performance reason in jython, I would much rather call a method on the node that does the traversing: class If: def traverse(self, walker): walker.dispatch(self.test) for stmt in self.body: walker.dispatch(stmt) for stmt in self.orelse: walker.dispatch(stmt) That is because calling a method can be cheaper than creating a tuple. I suppose that I will do something like that and then hide the 'traverse' method from python. >We should be able to define these as well, since the python.asdl >presents the children in the natural order. > >If we have a visitor that isn't interested in If nodes, then it simply >doesn't define a visitIf() method. If the AST contains an If node, >the default method of the walker handles it instead. > >The default method on the walker does the following: > > for child in node.getChildren(): > if child is not None: > self.visit(child) With my 'If' class above the 'default' method becomes: child.traverse(this) >The visit() method is defined by the walker. It takes an arbitrary >node as its argument, looks up the appropriate method on the >visitor, and calls it. The visitor can also use this method to >delegate responsibility for method lookup to the walker. > >Does that help? Yes it did, thanks. A question still remain: if I have a visitTryExcept() method, how would I then cause or prevent the default traversion of the children? In the example below I assume that a visitXXX function must deal with its children itself either one-by-one or by calling 'default()'. >I'm not sure how much of this maps naturally from Python to Java. I'm guessing the main problem will be whether the concrete visitor class must inherit from some base class. I like that requirement, you probably don't. The requirement of a visitor base class is easily met if the visitor and walker is joined into one class. That way an example visitor to find potential fastlocals would look like this: class FastLocalVisitor(ASTVisitor): def __init__(): self.infunc = False self.mode = 'GET' def visitFuncDef(self, node): self.infunc = True self.default(node) self.infunc = False def visitName(self, node): if self.infunc and self.mode == 'SET': print node.id, "is a potential fastlocal" def visitTryExcept(self, node): self.mode = 'SET' for e in node.handlers: self.dispatch(e.name) self.mode = 'GET' for e in node.handlers: self.dispatch(e.type) self.dispatch(e.body) self.dispatch(node.body) self.dispatch(node.orelse) def visitAssign(self, node): self.mode = 'SET' for t in node.targets: self.dispatch(t) self.mode = 'GET' self.dispatch(node.value) def visitFor(self, node): self.mode = 'SET' self.dispatch(node.target) self.mode = 'GET' self.dispatch(node.iter) self.dispatch(node.body) self.dispatch(node.value) FastLocalVisitor().visit(tree) So the ASTVisitor class is publicly defined as: class ASTVisitor: def default(self, node): # Should be called traverse IMO """Visit each of the children of node.""" def dispatch(self, node): # Should be called visit IMO """Visit the node (or call self.default()).""" def visitXXX(self, node): """Visit the node of type XXX. Defined in subclass""" if we want to get fancy, we can consider these methods: def post_visitXXX(self, node): """Visit the node of type XXX after the children have been visited. Defined in subclass""" def unhandled_node(self, node): """Called when no visitXXX method exists for the node. Defined in subclass""" def open_level(self, node): """Called just before the visitXXX is called. Defined in subclass""" def close_level(self, node): """Called just after the post_visitXXX is called. Defined in subclass""" I would also like to avoid documenting the use of cls.__name__ in the dispatching. The cls.__name__ for a java class should not be depended on because it looks like this: 'org.python.parser.ast.If'. I can deal with this in the implementation of ASTVisitor.dispatch() but I wouldn't want other application to depending too much on the __name__ of AST nodes. It would be better if we added a class attribute to the nodes that contained the official name. regards, finn From bckfnn@worldonline.dk Tue Apr 16 10:56:59 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Tue, 16 Apr 2002 09:56:59 GMT Subject: [Compiler-sig] Assign nodes In-Reply-To: References: Message-ID: <3cbbe809.10645387@mail.wanadoo.dk> [Jeremy, in a checkin msg] >Instead of extra Lvalue production, have separate constructors for the >assign type. This makes sense because the code generate for an >expression as the target of an assignment is quite different than an >expression elsewhere. An assign node in a 'del' statement is also quite different from an assignment so you would want a special set of nodes for deletions as well (DelAttribute, DelSubscript, DelName, DelList and DelTuple) ? >+ -- the subset of expressions that are valid as the target of >+ -- assignments. >+ assign = AssignAttribute(expr value, identifier attr) >+ | AssignSubscript(expr value, slice slice) >+ | AssignName(identifier id) >+ | AssignList(expr* elts) | AssignTuple(expr *elts) I think that should be: >+ | AssignList(assign* elts) | AssignTuple(assign *elts) And I don't like this change one single bit. Yes sure, it looks prettier than using Lvalue and it captures useful information, but it makes a one-pass AST builder a lot harder to do. You probably don't care because you have an intermediate parse tree with all the context needed to know if you must create a Name node or a AssignName node. When my parser detect a name, I have no way of knowing if an equal sign will turn up later. So I'll have to pick one node type (like Name) and rebuild that part of the tree later if it turned out that it was part of an assignment or deletion. Ugly. As I have said before, I think the desire for correct typing in the tree is overrated and I would rather remove the 'assign' sum altogether and add a flag to the Attribute, Subscript, Name, List and Tuple nodes that captured the use of the node (like 'Get', 'Set' and 'Del'). regards, finn From skip@pobox.com Tue Apr 16 15:55:59 2002 From: skip@pobox.com (Skip Montanaro) Date: Tue, 16 Apr 2002 09:55:59 -0500 Subject: [Compiler-sig] dipping my toe in... Message-ID: <15548.15231.393441.547286@12-248-41-177.client.attbi.com> -- 'type' is a bad name | Raise(expr? type, expr? inst, expr? tback) How about "exc" instead? Does the "?" imply the preceeding fields must be present if that is? For example, if we have an "inst", does that imply we also have a "type" the same way optional args work in Python? Skip From jeremy@zope.com Tue Apr 16 16:45:31 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Tue, 16 Apr 2002 11:45:31 -0400 Subject: [Compiler-sig] Assign nodes In-Reply-To: <3cbbe809.10645387@mail.wanadoo.dk> References: <3cbbe809.10645387@mail.wanadoo.dk> Message-ID: <15548.18203.915142.66377@slothrop.zope.com> >>>>> "FB" == Finn Bock writes: FB> And I don't like this change one single bit. Yes sure, it looks FB> prettier than using Lvalue and it captures useful information, FB> but it makes a one-pass AST builder a lot harder to do. I felt it simplified the code generator for the compiler package, but I don't have any experience with a one-pass AST builder. So it's hard for me to judge what the tradeoffs are. You haven't written a code generator, right? If so, we've each got experience with one end of the problem, but not with both. FB> You probably don't care because you have an intermediate parse FB> tree with all the context needed to know if you must create a FB> Name node or a AssignName node. I'd be interested in taking a look at the one-pass AST builder. Eventually, I'd like to have one for CPython. FB> When my parser detect a name, I have no way of knowing if an FB> equal sign will turn up later. So I'll have to pick one node FB> type (like Name) and rebuild that part of the tree later if it FB> turned out that it was part of an assignment or deletion. Ugly. Indeed, and perhaps a compelling case against the richer AST. I assume the difference is that CPython's compiler is top-down and yours is bottom-up? FB> As I have said before, I think the desire for correct typing in FB> the tree is overrated and I would rather remove the 'assign' sum FB> altogether and add a flag to the Attribute, Subscript, Name, FB> List and Tuple nodes that captured the use of the node (like FB> 'Get', 'Set' and 'Del'). I'll noodle with the code generator in the compiler package and see what it would look like if the AssignXXX nodes went away. Jeremy From bckfnn@worldonline.dk Tue Apr 16 16:49:46 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Tue, 16 Apr 2002 15:49:46 GMT Subject: [Compiler-sig] dipping my toe in... In-Reply-To: <15548.15231.393441.547286@12-248-41-177.client.attbi.com> References: <15548.15231.393441.547286@12-248-41-177.client.attbi.com> Message-ID: <3cbc4211.33692707@mail.wanadoo.dk> On Tue, 16 Apr 2002 09:55:59 -0500, you wrote: > > -- 'type' is a bad name > | Raise(expr? type, expr? inst, expr? tback) > >How about "exc" instead? > >Does the "?" imply the preceeding fields must be present if that is? For >example, if we have an "inst", does that imply we also have a "type" the >same way optional args work in Python?' No, asdl cannot capture that requirement in its syntax. Since we are controlling the codegeneration, we could decided to interpret the '?' the same way as optional args, but we would get some trouble with the lower/upper bounds in Slice and with the starargs/kwargs where both can optional. For easier mapping into java, I would prefer if we only used positional arguments but if we want them bad enough, I can also add support for keyword args to the ctors. regards, finn From bckfnn@worldonline.dk Tue Apr 16 17:17:48 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Tue, 16 Apr 2002 16:17:48 GMT Subject: [Compiler-sig] Assign nodes In-Reply-To: <15548.18203.915142.66377@slothrop.zope.com> References: <3cbbe809.10645387@mail.wanadoo.dk> <15548.18203.915142.66377@slothrop.zope.com> Message-ID: <3cbc489a.35366063@mail.wanadoo.dk> [Jeremy] >>>>>> "FB" == Finn Bock writes: > > FB> And I don't like this change one single bit. Yes sure, it looks > FB> prettier than using Lvalue and it captures useful information, > FB> but it makes a one-pass AST builder a lot harder to do. > >I felt it simplified the code generator for the compiler package, but >I don't have any experience with a one-pass AST builder. So it's hard >for me to judge what the tradeoffs are. You haven't written a code >generator, right? No, not from scratch. I did add the augassign code to both our bytecode generator and our javacode generator and so I do appreciate the benefit of being able to tell an evaluate Slice node from an assignment Slice node (and from a delete Slice node and from a AugAssign Slice node). >If so, we've each got experience with one end of >the problem, but not with both. > > FB> You probably don't care because you have an intermediate parse > FB> tree with all the context needed to know if you must create a > FB> Name node or a AssignName node. > >I'd be interested in taking a look at the one-pass AST builder. >Eventually, I'd like to have one for CPython. > > FB> When my parser detect a name, I have no way of knowing if an > FB> equal sign will turn up later. So I'll have to pick one node > FB> type (like Name) and rebuild that part of the tree later if it > FB> turned out that it was part of an assignment or deletion. Ugly. > >Indeed, and perhaps a compelling case against the richer AST. I >assume the difference is that CPython's compiler is top-down and yours >is bottom-up? Yes, jjtree creates the AST bottom-up. > FB> As I have said before, I think the desire for correct typing in > FB> the tree is overrated and I would rather remove the 'assign' sum > FB> altogether and add a flag to the Attribute, Subscript, Name, > FB> List and Tuple nodes that captured the use of the node (like > FB> 'Get', 'Set' and 'Del'). > >I'll noodle with the code generator in the compiler package and see >what it would look like if the AssignXXX nodes went away. Don't forget the flag! I prefer the flag (instead of seperate classes) because it is a lot easier and faster to change an int in the sub-tree than it is to recreate the sub-tree with different classes. Lets not get too hung up on it, I can also implement it with AssignXXXX (and DeleteXXXX and AugAssignXXXX) nodes. regards, finn From jeremy@zope.com Tue Apr 16 17:47:11 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Tue, 16 Apr 2002 12:47:11 -0400 Subject: [Compiler-sig] Assign nodes In-Reply-To: <3cbc489a.35366063@mail.wanadoo.dk> References: <3cbbe809.10645387@mail.wanadoo.dk> <15548.18203.915142.66377@slothrop.zope.com> <3cbc489a.35366063@mail.wanadoo.dk> Message-ID: <15548.21903.521485.622844@slothrop.zope.com> It looks like the code generator, as you mentioned, has to distinguish between store and del anyway. So getting rid of the special case for load, just means making it a three part default test. As it happens, the case of names is handled by three separate methods -- loadName(), storeName(), and delName() -- that are called from the visitor. So the separate node types doesn't really buy anything. Does the following patch look good? Jeremy Index: python.asdl =================================================================== RCS file: /cvsroot/python/python/nondist/sandbox/ast/python.asdl,v retrieving revision 1.12 diff -c -c -r1.12 python.asdl *** python.asdl 16 Apr 2002 03:20:45 -0000 1.12 --- python.asdl 16 Apr 2002 16:43:55 -0000 *************** *** 8,16 **** | ClassDef(identifier name, expr* bases, stmt* body) | Return(expr value) | Yield(expr value) ! | Del(assign* targets) ! | Assign(assign* targets, expr value) ! | AugAssign(assign target, operator op, expr value) -- not sure if bool is allowed, can always use int | Print(expr? dest, expr* value, bool nl) --- 8,16 ---- | ClassDef(identifier name, expr* bases, stmt* body) | Return(expr value) | Yield(expr value) ! | Del(expr* targets) ! | Assign(expr* targets, expr value) ! | AugAssign(expr target, operator op, expr value) -- not sure if bool is allowed, can always use int | Print(expr? dest, expr* value, bool nl) *************** *** 55,71 **** | Num(string n) -- string representation of a number | Str(string s) -- need to specify raw, unicode, etc? -- other literals? bools? ! | Attribute(expr value, identifier attr) ! | Subscript(expr value, slice slice) ! | Name(identifier id) ! | List(expr* elts) | Tuple(expr *elts) ! ! -- the subset of expressions that are valid as the target of ! -- assignments. ! assign = AssignAttribute(expr value, identifier attr) ! | AssignSubscript(expr value, slice slice) ! | AssignName(identifier id) ! | AssignList(expr* elts) | AssignTuple(expr *elts) slice = Ellipsis | Slice(expr? lower, expr? upper) -- maybe Slice and ExtSlice should be merged... --- 55,69 ---- | Num(string n) -- string representation of a number | Str(string s) -- need to specify raw, unicode, etc? -- other literals? bools? ! ! -- the following expression can appear in assignment context ! | Attribute(expr value, identifier attr, expr_context ctx) ! | Subscript(expr value, slice slice, expr_context ctx) ! | Name(identifier id, expr_context ctx) ! | List(expr* elts, expr_context ctx) ! | Tuple(expr *elts, expr_context ctx) ! ! expr_context = Load | Store | Del slice = Ellipsis | Slice(expr? lower, expr? upper) -- maybe Slice and ExtSlice should be merged... *************** *** 84,90 **** -- not sure what to call the first argument for raise and except ! except = (expr? type, assign? name, stmt* body) -- XXX need to handle 'def f((a, b)):' arguments = (identifier* args, identifier? vararg, --- 82,88 ---- -- not sure what to call the first argument for raise and except ! except = (expr? type, expr? name, stmt* body) -- XXX need to handle 'def f((a, b)):' arguments = (identifier* args, identifier? vararg, From bckfnn@worldonline.dk Tue Apr 16 19:09:48 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Tue, 16 Apr 2002 18:09:48 GMT Subject: [Compiler-sig] Assign nodes In-Reply-To: <15548.21903.521485.622844@slothrop.zope.com> References: <3cbbe809.10645387@mail.wanadoo.dk> <15548.18203.915142.66377@slothrop.zope.com> <3cbc489a.35366063@mail.wanadoo.dk> <15548.21903.521485.622844@slothrop.zope.com> Message-ID: <3cbc5a61.39917458@mail.wanadoo.dk> [Jeremy] >It looks like the code generator, as you mentioned, has to distinguish >between store and del anyway. So getting rid of the special case for >load, just means making it a three part default test. > >As it happens, the case of names is handled by three separate methods >-- loadName(), storeName(), and delName() -- that are called from the >visitor. So the separate node types doesn't really buy anything. > >Does the following patch look good? At first glance it seems to be exactly what I needed. I'll take a closer look tomorrow. regards, finn From jeremy@zope.com Tue Apr 16 22:24:53 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Tue, 16 Apr 2002 17:24:53 -0400 Subject: [Compiler-sig] dipping my toe in... In-Reply-To: <15548.15231.393441.547286@12-248-41-177.client.attbi.com> References: <15548.15231.393441.547286@12-248-41-177.client.attbi.com> Message-ID: <15548.38565.81743.936691@slothrop.zope.com> >>>>> "SM" == Skip Montanaro writes: SM> -- 'type' is a bad name SM> | Raise(expr? type, expr? inst, expr? tback) SM> How about "exc" instead? Yes. SM> Does the "?" imply the preceeding fields must be present if that SM> is? For example, if we have an "inst", does that imply we also SM> have a "type" the same way optional args work in Python? It's not that powerful. It just says that the type is optional. So the Raise ctor above would actually accept a raise statement with only a traceback. Jeremy From skip@pobox.com Tue Apr 16 23:03:13 2002 From: skip@pobox.com (Skip Montanaro) Date: Tue, 16 Apr 2002 17:03:13 -0500 Subject: [Compiler-sig] dipping my toe in... In-Reply-To: <15548.38565.81743.936691@slothrop.zope.com> References: <15548.15231.393441.547286@12-248-41-177.client.attbi.com> <15548.38565.81743.936691@slothrop.zope.com> Message-ID: <15548.40865.696305.300429@12-248-41-177.client.attbi.com> SM> Does the "?" imply the preceeding fields must be present if that is? SM> For example, if we have an "inst", does that imply we also have a SM> "type" the same way optional args work in Python? Jeremy> It's not that powerful. It just says that the type is optional. Jeremy> So the Raise ctor above would actually accept a raise statement Jeremy> with only a traceback. So the parser is the "guard at the gate" to prevent such stuff from turning up in the AST? I ask these because I'm still a little confused about what exactly this stuff does. Looking at test.py import ast print ast.transform("""global a, b, c a + b - c * 3 """) suggests that somehow it's parsing the Python code, but that's not what's happening. Still, it's not clear why ast_transform always returns None. Sorry for the frontal lobe density... Skip From jeremy@zope.com Tue Apr 16 23:11:19 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Tue, 16 Apr 2002 18:11:19 -0400 Subject: [Compiler-sig] dipping my toe in... In-Reply-To: <15548.40865.696305.300429@12-248-41-177.client.attbi.com> References: <15548.15231.393441.547286@12-248-41-177.client.attbi.com> <15548.38565.81743.936691@slothrop.zope.com> <15548.40865.696305.300429@12-248-41-177.client.attbi.com> Message-ID: <15548.41351.22256.793404@slothrop.zope.com> >>>>> "SM" == Skip Montanaro writes: SM> Does the "?" imply the preceeding fields must be present if that SM> is? For example, if we have an "inst", does that imply we also SM> have a "type" the same way optional args work in Python? Jeremy> It's not that powerful. It just says that the type is Jeremy> optional. So the Raise ctor above would actually accept a Jeremy> raise statement with only a traceback. SM> So the parser is the "guard at the gate" to prevent such stuff SM> from turning up in the AST? Someone is responsible, but it's not clear who. In compile.c, there are a variety of tests that occur during code generator, like illegal expressions as assignment targets. In any particular implementation, the front end is going to produce the AST. It's got to guarantee that the AST is valid. The ast module I'm working on will probably do those checks on the completed AST before returning it. But I haven't gotten that far yet. It might end up detecting the problem while creating the AST. In fact, the expr_context patch I sent around earlier suggests that the error would be determined sooner, because only valid expression types have the expr_context slot. SM> I ask these because I'm still a little confused about what SM> exactly this stuff does. Looking at test.py SM> import ast print ast.transform("""global a, b, c a + b - c * SM> 3 """) SM> suggests that somehow it's parsing the Python code, but that's SM> not what's happening. Actually, it is what's happening. It compiles the source, then converts it to an AST. SM> not what's happening. Still, it's not clear why ast_transform SM> always returns None. Only because there isn't anything else useful to return. The AST is not a PyObject *. I thought about writing the pickling code and returning a pickle, but that seemed like too much work. Instead, I'm just exercise the ast transformation code and then throwing away the result. (You may also have noticed that I never free any memory. :-) SM> Sorry for the frontal lobe density... It's not your fault. I'm not good at explaining what I'm currently doing, and I haven't finished yet :-). Jeremy From neal@metaslash.com Wed Apr 17 13:16:33 2002 From: neal@metaslash.com (Neal Norwitz) Date: Wed, 17 Apr 2002 08:16:33 -0400 Subject: [Compiler-sig] Assign nodes References: <3cbbe809.10645387@mail.wanadoo.dk> Message-ID: <3CBD67A1.F3B58841@metaslash.com> Jeremy: In ast_for_augassign (astmodule.c), you are attempting to find the assign type, but is it correct for <<= and >>= ? Don't you need to add something like: case '<': if (STR(n)[1] == '<') return LShift; fprintf(stderr, "invalid augassign: %s", STR(n)); return 0; And the same for the '>'? Also, do you want me to do some little cleanups and checkin? Or would you prefer to discuss here first? Do you have any small pieces you want me to try to help with? Neal From jeremy@zope.com Wed Apr 17 16:47:40 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Wed, 17 Apr 2002 11:47:40 -0400 Subject: [Compiler-sig] Assign nodes In-Reply-To: <3CBD67A1.F3B58841@metaslash.com> References: <3cbbe809.10645387@mail.wanadoo.dk> <3CBD67A1.F3B58841@metaslash.com> Message-ID: <15549.39196.341872.192917@slothrop.zope.com> >>>>> "NN" == Neal Norwitz writes: NN> Jeremy: In ast_for_augassign (astmodule.c), you are attempting NN> to find the assign type, but is it correct for <<= and >>= ? NN> Don't you need to add something like: NN> case '<': NN> if (STR(n)[1] == '<') NN> return LShift; NN> fprintf(stderr, "invalid augassign: %s", STR(n)); NN> return 0; NN> And the same for the '>'? What is this checking for? NN> Also, do you want me to do some little cleanups and checkin? Or NN> would you prefer to discuss here first? Feel free to cleanup first and ask questions later. NN> Do you have any small pieces you want me to try to help with? I was thinking it would be helpful for someone to work on the pickler code. I started, but didn't get very far. I haven't given a lot of thought about how to expose the AST objects to Python. It seems like making the PyObject's introduces a lot of overhead that doesn't serve any purpose in the common case. (I'm assuming that the common case is the compiler using the AST internally to generate bytecode and then throwing it away.) Pickling the AST from C and unpickling it from Python seems like a simple way to share the AST without needing a PyObject-style interface. Two other possible projects are better memory management and better error handling. I decided to basically punt on that for now and re-visit it after the astmodule is complete. I suspect that an arena style of allocation may be useful, where memory for the AST is allocated from the arena and the arena is freed by one call when the AST is no longer needed. Jeremy From ecn@metaslash.com Thu Apr 18 04:02:24 2002 From: ecn@metaslash.com (Eric C. Newton) Date: Wed, 17 Apr 2002 23:02:24 -0400 Subject: [Compiler-sig] AST observations Message-ID: <20020417230224.A21385@ecn> I've noticed a few things with the current implementation of ASTs in Python 2.2 from my work on PyChecker2. These are just some notes. compiler.visitor: ASTVisitor As the comment says, it's not a visitor, it's a walker. Someone mentioned earlier that this is rather confusing. There is this comment: "If the visitor method returns a true value, the ASTVisitor will not traverse the child nodes." I see no code which checks the return value. Performance: For me, the _cache mechanism actually slows down visitation. I found that pre-caching the method names in preorder() _is_ faster. Most of my dispatching uses the default dispatcher; the getChildNodes() method, along with compiler.ast.flatten and compiler.ast.flatten_nodes are significant overheads. All that said, I re-wrote it using a number of techniques: removed the ability to add variable arguments to the walk (no improvement) fewer transformations from lists to tuples (minor improv.) smarter construction of lists (minor improv.) custom getChildNodes() for each class to eliminate calls to flatten() (minor improv.) pass a visit() function to a visitChildren() method (slower!) write a default recursive dispatcher in C (slower!) Convention: Setting the "visit" method on the Visitor is, ahem, a novel approach. It's a convention that PyChecker (the use of, not the development of) doesn't like ("unknown method visit()"). I don't know if I don't like it, but it was unexpected. Passing the walker to the dispatch function is the sort of thing I would expect. In general, pychecker2 calls "walk(node, Visitor())" a LOT. The first version of pychecker did a lot of things in a single pass. That is pretty efficient, but it's harder to add more checks without creating really dense code. I'm trying to structure pychecker2 around lots of independent checks, so it will be easier to contribute new code. The consequence is: I would really like the visitor stuff to run efficiently. I gave up on trying to use recursion to figure out a Node's parents in an AST tree. Very often I need to know the parents of the Node I'm looking at, and using recursion to hold this information was becoming cumbersome. For the time being, I'm adding a parent link to all AST nodes just after a file is parsed. An alternative data-structure (heap?) would be just as swell so long as I can compute this efficiently. Line numbers appear to be added to AST nodes in arbitrary ways. Some interesting projects which re-write existing code might like other token information, like comments. People have requested features for pychecker, like detecting unnecessary parens and semicolons, which is not possible, since these are not part of the AST. -Eric From bckfnn@worldonline.dk Thu Apr 18 09:52:00 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Thu, 18 Apr 2002 08:52:00 GMT Subject: [Compiler-sig] AST observations In-Reply-To: <20020417230224.A21385@ecn> References: <20020417230224.A21385@ecn> Message-ID: <3cbe77b9.3930812@mail.wanadoo.dk> [Eric C. Newton] >I've noticed a few things with the current implementation of ASTs in >Python 2.2 from my work on PyChecker2. These are just some notes. >[...] > > There is this comment: > > "If the visitor method returns a true value, the > ASTVisitor will not traverse the child nodes." > > I see no code which checks the return value. And that is IMO a very good thing that is isn't implemented as documented. If the visitor/walker hijacked the return value of visitXXXX methods for its own purposes, the visitor would be completely useless in both our bytecode compiler and our javacode compiler. > Performance: > > For me, the _cache mechanism actually slows down > visitation. I found that pre-caching the method names in > preorder() _is_ faster. > > Most of my dispatching uses the default dispatcher; the > getChildNodes() method, along with compiler.ast.flatten > and compiler.ast.flatten_nodes are significant overheads. > > All that said, I re-wrote it using a number of techniques: > > removed the ability to add variable arguments to the walk > (no improvement) > > fewer transformations from lists to tuples (minor improv.) > > smarter construction of lists (minor improv.) > > custom getChildNodes() for each class to eliminate > calls to flatten() (minor improv.) > > pass a visit() function to a visitChildren() method (slower!) > > write a default recursive dispatcher in C (slower!) > > Convention: > > Setting the "visit" method on the Visitor is, ahem, a novel > approach. I think you are way too kind here. It sucks; it is plainly a hack and it is quite hard to map into efficient java. > It's a convention that PyChecker (the use of, not > the development of) doesn't like ("unknown method visit()"). > I don't know if I don't like it, but it was unexpected. > Passing the walker to the dispatch function is the sort of > thing I would expect. > >In general, pychecker2 calls "walk(node, Visitor())" a LOT. The first >version of pychecker did a lot of things in a single pass. That is >pretty efficient, but it's harder to add more checks without creating >really dense code. I'm trying to structure pychecker2 around lots of >independent checks, so it will be easier to contribute new code. The >consequence is: I would really like the visitor stuff to run >efficiently. I also want the visitor pattern to be superfast because I want to use it for the on-the-fly javabytecode generation. If it turns out that the chosen visitor pattern isn't sufficiently efficient I'll be forced to make our own visitor pattern in parallel with the one in the compiler package >I gave up on trying to use recursion to figure out a Node's parents in >an AST tree. Very often I need to know the parents of the Node I'm >looking at, and using recursion to hold this information was becoming >cumbersome. For the time being, I'm adding a parent link to all AST >nodes just after a file is parsed. An alternative data-structure >(heap?) would be just as swell so long as I can compute this >efficiently. I don't need a parent link for codegeneration (if I did, it would have been added already ) so from my primary POV, adding a parent link is pure memory overhead. OTOH in my treebuilder I have hooks what would allow me to create a dictionary of node->parentnode mappings while I'm creating the AST tree. So how is this for an alternative idea: The main methods (parse() and parseFile()) grows an optional dict=None argument. When that argument if not None each created AST node is inserted as key with the parent node as its value. >Line numbers appear to be added to AST nodes in arbitrary ways. > >Some interesting projects which re-write existing code might like >other token information, like comments. > >People have requested features for pychecker, like detecting >unnecessary parens and semicolons, which is not possible, since these >are not part of the AST. Again it seems like we have been overly focused on codegen. I'm looking forward to seing Jeremy's thougths on adding parsetree info to the AST. regards, finn From bckfnn@worldonline.dk Thu Apr 18 10:37:52 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Thu, 18 Apr 2002 09:37:52 GMT Subject: [Compiler-sig] AST observations In-Reply-To: <20020417230224.A21385@ecn> References: <20020417230224.A21385@ecn> Message-ID: <3cbe9072.10260573@mail.wanadoo.dk> [Eric C. Newton] >I gave up on trying to use recursion to figure out a Node's parents in >an AST tree. Very often I need to know the parents of the Node I'm >looking at, and using recursion to hold this information was becoming >cumbersome. If you only need the ancestry of the current node in the visitXXXX methods, then the open_level(), close_level() visitor methods that I suggested previously should work nicely: class MyVisitor(ASTVisitor): def __init__(self): self.ancestry = [] def open_level(self, node): self.ancestry.append(node) def close_level(self, node): self.ancestry.pop() def visitFunctionDef(self, node): print "parent is", self.ancestry[-2] regards, finn From ecn@metaslash.com Thu Apr 18 12:14:08 2002 From: ecn@metaslash.com (Eric C. Newton) Date: Thu, 18 Apr 2002 07:14:08 -0400 Subject: [Compiler-sig] AST observations In-Reply-To: <3cbe9072.10260573@mail.wanadoo.dk>; from bckfnn@worldonline.dk on Thu, Apr 18, 2002 at 09:37:52AM +0000 References: <20020417230224.A21385@ecn> <3cbe9072.10260573@mail.wanadoo.dk> Message-ID: <20020418071408.A24756@ecn> Yes, I did a lot of this, too. I gave up when I wanted to re-use something like a "find all Name Nodes" visitor, and then look up the parents later. I can certainly keep a reverse map. I'm already keeping a reverse map from symbol to node, too. On Thu, Apr 18, 2002 at 09:37:52AM +0000, Finn Bock wrote: > [Eric C. Newton] > > >I gave up on trying to use recursion to figure out a Node's parents in > >an AST tree. Very often I need to know the parents of the Node I'm > >looking at, and using recursion to hold this information was becoming > >cumbersome. > > If you only need the ancestry of the current node in the visitXXXX > methods, then the open_level(), close_level() visitor methods that I > suggested previously should work nicely: > > class MyVisitor(ASTVisitor): > def __init__(self): > self.ancestry = [] > > def open_level(self, node): > self.ancestry.append(node) > > def close_level(self, node): > self.ancestry.pop() > > def visitFunctionDef(self, node): > print "parent is", self.ancestry[-2] > > > regards, > finn From jeremy@zope.com Thu Apr 18 16:48:46 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Thu, 18 Apr 2002 11:48:46 -0400 Subject: [Compiler-sig] AST observations In-Reply-To: <20020417230224.A21385@ecn> References: <20020417230224.A21385@ecn> Message-ID: <15550.60126.578441.518744@slothrop.zope.com> Starting with the last stuff first... >>>>> "ECN" == Eric C Newton writes: ECN> Line numbers appear to be added to AST nodes in arbitrary ways. Indeed. It hasn't been obvious when to add line numbers. The original transformer added line numbers to statements, as far as I could tell. This didn't seem sufficient, because, e.g., the except handler lines aren't individual statements. ECN> Some interesting projects which re-write existing code might ECN> like other token information, like comments. Yes. Refactoring tools would really like to have detailed position information about each character. I wouldn't find it acceptable if such a tool reformatted my code. ECN> People have requested features for pychecker, like detecting ECN> unnecessary parens and semicolons, which is not possible, since ECN> these are not part of the AST. That's by design. An AST is a compiler intermediate representation and parens and semicolons aren't part of the intermediate representation. If the analysis has to do with the syntax of the language, I don't think the AST is the right place to check it. How do you tell when a pair of parens is unnecessary, BTW? I've often used parens around the text part of an if statement so that emacs formats it nicely when it takes up more than one line. I find this a completely acceptable use of "unnecessary" parens. But regardless of whether the AST should be used for simple syntax checking (maybe parens aren't just a syntactic issue), it would be really helpful to decorate the AST with information about the tokens that make up each node. I don't know enough about the Python parser to know if it's possible to get the parser to pass it along to the AST transformer in the compiler. I have the impression that things like comments get tossed pretty early on. In general, the AST doesn't want to have all the detailed token information, because it doesn't care about them. It would waste time and space to record the information for the compiler. So if a particular app needs the extra token info, it seems like we could use the tokeinze module to collect the token info and associate it with the AST. I'm not sure how this would work in any detail, or I would have tried it already :-). Jeremy From jeremy@zope.com Thu Apr 18 16:50:25 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Thu, 18 Apr 2002 11:50:25 -0400 Subject: [Compiler-sig] AST observations In-Reply-To: <3cbe77b9.3930812@mail.wanadoo.dk> References: <20020417230224.A21385@ecn> <3cbe77b9.3930812@mail.wanadoo.dk> Message-ID: <15550.60225.950885.692091@slothrop.zope.com> >>>>> "FB" == Finn Bock writes: FB> [Eric C. Newton] >> There is this comment: >> >> "If the visitor method returns a true value, the ASTVisitor will >> not traverse the child nodes." >> >> I see no code which checks the return value. FB> And that is IMO a very good thing that is isn't implemented as FB> documented. If the visitor/walker hijacked the return value of FB> visitXXXX methods for its own purposes, the visitor would be FB> completely useless in both our bytecode compiler and our FB> javacode compiler. Yes. I think the comment just needs to be removed. It seemed like a good idea when I started the project, but I don't think I ever found a use for it. Jeremy From bckfnn@worldonline.dk Thu Apr 18 19:20:47 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Thu, 18 Apr 2002 18:20:47 GMT Subject: [Compiler-sig] Changes to python.asdl Message-ID: <3cbf0ab3.41557296@mail.wanadoo.dk> Hi, I just committed a few small changes to the python.asdl (something caused syncmail to blow up so the checkin mail might be missing) and I'm little worried that such changes cause a lot of pain to your code for the CPython AST tree. I don't have a toolchain available so that I can compile and test your C code yet. I know that we are still in the sandbox and so we should be allowed to play, I want to play nice. regards, finn From jeremy@zope.com Thu Apr 18 19:31:41 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Thu, 18 Apr 2002 14:31:41 -0400 Subject: [Compiler-sig] AST observations In-Reply-To: <20020417230224.A21385@ecn> References: <20020417230224.A21385@ecn> Message-ID: <15551.4365.375570.60402@slothrop.zope.com> >>>>> "ECN" == Eric C Newton writes: ECN> compiler.visitor: ECN> ASTVisitor ECN> As the comment says, it's not a visitor, it's a walker. ECN> Someone mentioned earlier that this is rather ECN> confusing. I'm sorry this is confusing, but I think it is one of the standard variations on the visitor pattern. It's certainly the case that the visitors we've all been writing have the signs of being a visitor, e.g. a method for each class of object. As for how the traversal occurs, GoF (p. 339) says: "Who is responsible for traversing the object structure? A visitor must visit each element of the object structure. The question is, how does it get there? We can put the responsibility in any of three places: in the object structure, in the visitor, or in a separate iterator object." The text goes on to discuss these alternatives and notes that you could also use an internal iterator that is a kind of hybrid between having the traversal in the object structure and using an iterator. In this case, the iterator calls a method on the visitor with the object as an argument as opposed to calling a method of the object with the visitor as the argument. It might be clearer to merge the walker and the visitor into a single class using inheritance. (I think the Walkabout variant described by Palsberg and Jay does this, cf. http://citeseer.nj.nec.com/palsberg97essence.html.) But I thought delegation would be clearer and would avoid the need for a magic base class that all visitors must inherit from. Jeremy From jeremy@zope.com Thu Apr 18 19:36:41 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Thu, 18 Apr 2002 14:36:41 -0400 Subject: [Compiler-sig] Changes to python.asdl In-Reply-To: <3cbf0ab3.41557296@mail.wanadoo.dk> References: <3cbf0ab3.41557296@mail.wanadoo.dk> Message-ID: <15551.4665.848301.588586@slothrop.zope.com> >>>>> "FB" == Finn Bock writes: FB> I know that we are still in the sandbox and so we should be FB> allowed to play, I want to play nice. No problem here. We've got to get the AST right in the end. If my code is going to break, sooner is better than later :-). Jeremy From bckfnn@worldonline.dk Thu Apr 18 19:44:22 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Thu, 18 Apr 2002 18:44:22 GMT Subject: [Compiler-sig] Slices Message-ID: <3cbf10ad.43086805@mail.wanadoo.dk> Hi, Trying again for some feedback on the slice production. I still can't figure out how to use the existing Slice and ExtSlice so I tried to make some changes like this: slice = Ellipsis | Slice(expr? lower, expr? upper) -- maybe Slice and ExtSlice should be merged... - | ExtSlice(expr* dims) + | ExtSlice(slice* dims) + | Index(expr value) boolop = And | Or Which I use in the following way (one the slice part of the Subscript is included): L[1] slice=Index[value=Num[n=1]] L[1:2] slice=Slice[lower=Num[n=1], upper=Num[n=2]] L[1:2, 3] slice=ExtSlice[dims=[ Slice[lower=Num[n=1], upper=Num[n=2]], Index[dims=Num[n=3] ]] A single expr is wrapped by the Index(), a slice is ofcourse wrapped with a Slice() and comma seperated list of slices is wrapped by a ExtSlice() object. Does it make sense? regards, finn From jeremy@zope.com Thu Apr 18 19:44:32 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Thu, 18 Apr 2002 14:44:32 -0400 Subject: [Compiler-sig] AST observations In-Reply-To: <20020417230224.A21385@ecn> References: <20020417230224.A21385@ecn> Message-ID: <15551.5136.674941.912241@slothrop.zope.com> (I hope it's okay that I'm responding in little chunks. There's been a lot to digest.) >>>>> "ECN" == Eric C Newton writes: ECN> Setting the "visit" method on the Visitor is, ahem, a ECN> novel approach. It's a convention that PyChecker (the ECN> use of, not the development of) doesn't like ("unknown ECN> method visit()"). I don't know if I don't like it, but ECN> it was unexpected. Passing the walker to the dispatch ECN> function is the sort of thing I would expect. Ahem is a nicer word that sucks :-). Anyway, it seemed like a clear delegation to me and the alternative seemed to involve a lot of helper methods that didn't serve any functional purpose and required visitors to inherit from a special base class. The joy of Python is not in writing reams of boring code, or something like that. I think the boring code would go something like this: class VisitorBase: def __init__(self): self._visit_hook = None # a hook for the walker def register_walker(self, walker): self._visit_hook = walker.dispatch def unregister_walker(self): self._visit_hook = None def visit(self, *args, **kwds): self._visit_hook(*args, **kwds) That's a dozen lines of code to do something that still needs to be explained in the doc string. So I figured, I'd just use an assignment and explain it in the doc string. Jeremy From jeremy@zope.com Thu Apr 18 19:53:48 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Thu, 18 Apr 2002 14:53:48 -0400 Subject: [Compiler-sig] AST observations In-Reply-To: <3cbe77b9.3930812@mail.wanadoo.dk> References: <20020417230224.A21385@ecn> <3cbe77b9.3930812@mail.wanadoo.dk> Message-ID: <15551.5692.594518.799386@slothrop.zope.com> Finally, I should say that I don't have any strong attachment to the visitor code in the compiler package. Nor do I have any strong attachment to the AST it defines. I've started over with python.asdl, and I don't have any problems with starting over on a new visitor structure. Let's find the ideas that work best for all our applications. I'm probably going to want something visitor like implemented in C for the builtin bytecode compiler; I don't have any idea what that will look like yet. Efficiency was a non-goal for compiler.visitor. I didn't even consider whether generating efficient Java code was possible. (What would be the point of writing in the subset of Python that can be translated efficiently to Java? ) We're kindof stuck with compiler package as it exists now, since it's a std part of 2.2. But there's no reason it can't grow new classes that provide new or improved functionality. On the subject of what the right visitor style is, it looks like Eelco and Joost Visser are doing interesting work in this area on the guide pattern and visitor combinators, respectively. (I don't have any idea of the people are related, although the ideas are at some level :-). I don't have links handy, but I a google search on name + visitor will get you there. Eric-- Feel free to contribute concrete patches that make the existing visitor code faster. I tried to speed it up too, once, and didn't make much progress. Much as you saw, I didn't see obvious changes that made a big difference. Jeremy From bckfnn@worldonline.dk Thu Apr 18 21:03:11 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Thu, 18 Apr 2002 20:03:11 GMT Subject: [Compiler-sig] AST observations In-Reply-To: <15551.5692.594518.799386@slothrop.zope.com> References: <20020417230224.A21385@ecn> <3cbe77b9.3930812@mail.wanadoo.dk> <15551.5692.594518.799386@slothrop.zope.com> Message-ID: <3cbf20a4.47173832@mail.wanadoo.dk> [Jeremy] >Finally, I should say that I don't have any strong attachment to the >visitor code in the compiler package. Nor do I have any strong >attachment to the AST it defines. I've started over with python.asdl, >and I don't have any problems with starting over on a new visitor >structure. Let's find the ideas that work best for all our >applications. I'm probably going to want something visitor like >implemented in C for the builtin bytecode compiler; I don't have any >idea what that will look like yet. > >Efficiency was a non-goal for compiler.visitor. I didn't even >consider whether generating efficient Java code was possible. (What >would be the point of writing in the subset of Python that can be >translated efficiently to Java? ) I see the wink, but the reason is portability between CodeCompiler.java (which creates java bytecode) and SimpleCompiler.py (which generates java sourcecode). Both codegenerators have to do the same work and they uses a lot of the same tricks to do it, and right now they use the same Visitor API. I would not be happy if the two codegens had to use two different Visitor API's just because one is written in java and the other in python. I would rather not use the official and documented visitor pattern and roll my own visitors (for both my uses), like you plan on doing for your C bytecode codegen. >We're kindof stuck with compiler package as it exists now, since it's >a std part of 2.2. But there's no reason it can't grow new classes >that provide new or improved functionality. Which begs the question: where do you intend to put the new AST classes and the supporting functions? A new module in the 'compiler' package? >On the subject of what the right visitor style is, ... It might just be my expectation for a visitor pattern that is wrong and visitor.py that is right. I believe I understood visitor.py when I read your explanation. regards, finn From ecn@metaslash.com Fri Apr 19 03:52:53 2002 From: ecn@metaslash.com (Eric C. Newton) Date: Thu, 18 Apr 2002 22:52:53 -0400 Subject: [Compiler-sig] AST observations In-Reply-To: <15551.5136.674941.912241@slothrop.zope.com>; from jeremy@zope.com on Thu, Apr 18, 2002 at 02:44:32PM -0400 References: <20020417230224.A21385@ecn> <15551.5136.674941.912241@slothrop.zope.com> Message-ID: <20020418225253.A30822@ecn> > Anyway, it seemed like a clear delegation to me Right, but I expected the walker to pass itself, rather than giving the visitor a new method. I would be quite happy if the object was assigned, rather than a bound method: visitor.walker = self A bit more conventional, but you won't need to pass the walker to every visit method. > So I figured, I'd just use an assignment and explain it in the doc > string. That's fine. I don't see a need for accessors. -Eric From ecn@metaslash.com Fri Apr 19 12:26:52 2002 From: ecn@metaslash.com (Eric C. Newton) Date: Fri, 19 Apr 2002 07:26:52 -0400 Subject: [Compiler-sig] AST observations In-Reply-To: <15551.4365.375570.60402@slothrop.zope.com>; from jeremy@zope.com on Thu, Apr 18, 2002 at 02:31:41PM -0400 References: <20020417230224.A21385@ecn> <15551.4365.375570.60402@slothrop.zope.com> Message-ID: <20020419072652.B30822@ecn> > I'm sorry this is confusing, but I think it is one of the standard > variations on the visitor pattern. Clearly, the class with "visit" methods is a Visitor. Now this other thing is also called ASTVisitor, even though it delagates visitation to a third class with visit methods. Reminds me of java code, where the same word appears 4 times: Thing thing = new Thing(thingy); > "Who is responsible for traversing the object structure? A visitor > must visit each element of the object structure. The question is, how > does it get there? We can put the responsibility in any of three > places: in the object structure, in the visitor, or in a separate > iterator object." Ok, TreeIterator works for me, too. 8-) > It might be clearer to merge the walker and the visitor into a single > class using inheritance. (I think the Walkabout variant described by > Palsberg and Jay does this, > cf. http://citeseer.nj.nec.com/palsberg97essence.html.) But I > thought delegation would be clearer and would avoid the need for a > magic base class that all visitors must inherit from. The only advantage I can see for this approach is faster visitation: the base class could have default visit methods that would know how to iterate over the child nodes. getChildNodes() would no longer be necessary. -Eric From neal@metaslash.com Fri Apr 19 14:07:43 2002 From: neal@metaslash.com (Neal Norwitz) Date: Fri, 19 Apr 2002 09:07:43 -0400 Subject: [Compiler-sig] Error checking macros Message-ID: <3CC0169F.A8BB29B0@metaslash.com> I don't know if others have seen the following technique before. Eric and I have used it to greatly reduce lines of code for error checking. We borrowed this technique from ACE http://www.cs.wustl.edu/~schmidt/ACE.html (I think). /* START MACRO */ #define ERR_NULL_CHECK(arg, func_name) \ do { \ if (!(arg)) { \ PyErr_SetString(PyExc_ValueError, \ "field " ##arg " required for " func_name); \ return NULL; \ } \ } while (0) /* END MACRO */ Then code that used to look like this (FunctionDef): if (!name) { PyErr_SetString(PyExc_ValueError, "field name is required for FunctionDef"); return NULL; } if (!args) { PyErr_SetString(PyExc_ValueError, "field args is required for FunctionDef"); return NULL; } if (!body) { PyErr_SetString(PyExc_ValueError, "field body is required for FunctionDef"); return NULL; } Would become: ERR_NULL_CHECK(name, "FunctionDef"); ERR_NULL_CHECK(args, "FunctionDef"); ERR_NULL_CHECK(body, "FunctionDef"); Often, the macro has RETURN in the name, to something to indicate that the macro could return. Is there any interest in this? Neal From ecn@metaslash.com Fri Apr 19 14:16:44 2002 From: ecn@metaslash.com (Eric C. Newton) Date: Fri, 19 Apr 2002 09:16:44 -0400 Subject: [Compiler-sig] AST observations In-Reply-To: <15551.5136.674941.912241@slothrop.zope.com>; from jeremy@zope.com on Thu, Apr 18, 2002 at 02:44:32PM -0400 References: <20020417230224.A21385@ecn> <15551.5136.674941.912241@slothrop.zope.com> Message-ID: <20020419091644.B2537@ecn> > Ahem is a nicer word than sucks :-). Poisonous invective lowers morale. Except when directed at Neal. -Eric From skip@pobox.com Fri Apr 19 14:21:13 2002 From: skip@pobox.com (Skip Montanaro) Date: Fri, 19 Apr 2002 08:21:13 -0500 Subject: [Compiler-sig] Error checking macros In-Reply-To: <3CC0169F.A8BB29B0@metaslash.com> References: <3CC0169F.A8BB29B0@metaslash.com> Message-ID: <15552.6601.918072.944843@12-248-41-177.client.attbi.com> Neal> I don't know if others have seen the following technique before. Neal> Eric and I have used it to greatly reduce lines of code for error Neal> checking. This looks like it would be okay as long as you only call it when you hold no references to Python objects. In Python code you frequently can't just return NULL, but have to DECREF some Python objects first. I don't know if this problem will arise here (I don't see a lot of context in your example - is it just for NULL checking input args?), but I assume it might. Skip From neal@metaslash.com Fri Apr 19 14:36:50 2002 From: neal@metaslash.com (Neal Norwitz) Date: Fri, 19 Apr 2002 09:36:50 -0400 Subject: [Compiler-sig] Error checking macros References: <3CC0169F.A8BB29B0@metaslash.com> <15552.6601.918072.944843@12-248-41-177.client.attbi.com> Message-ID: <3CC01D72.ED1E0CB9@metaslash.com> Skip Montanaro wrote: > > Neal> I don't know if others have seen the following technique before. > Neal> Eric and I have used it to greatly reduce lines of code for error > Neal> checking. > > This looks like it would be okay as long as you only call it when you hold > no references to Python objects. In Python code you frequently can't just > return NULL, but have to DECREF some Python objects first. I don't know if > this problem will arise here (I don't see a lot of context in your example - > is it just for NULL checking input args?), but I assume it might. In this case, yes, I am only talking about checking input args. This is for generated code which is very regular. It's used to store AST info. Here's a bit more context. I actually wrote code to test this. The old code for a function looked like this: stmt_ty ClassDef(identifier name, asdl_seq * bases, asdl_seq * body) { stmt_ty p; if (!name) { PyErr_SetString(PyExc_ValueError, "field name is required for ClassDef"); return NULL; } if (!bases) { PyErr_SetString(PyExc_ValueError, "field bases is required for ClassDef"); return NULL; } if (!body) { PyErr_SetString(PyExc_ValueError, "field body is required for ClassDef"); return NULL; } p = (stmt_ty)malloc(sizeof(*p)); if (!p) { PyErr_SetString(PyExc_MemoryError, "no memory"); return NULL; } p->kind = ClassDef_kind; p->v.ClassDef.name = name; p->v.ClassDef.bases = bases; p->v.ClassDef.body = body; return p; } The new code looks like this: stmt_ty ClassDef(identifier name, asdl_seq * bases, asdl_seq * body) { stmt_ty p; ERR_NULL_CHECK(name, "ClassDef"); ERR_NULL_CHECK(bases, "ClassDef"); ERR_NULL_CHECK(body, "ClassDef"); CHECK_MALLOC(p, stmt_ty); p->kind = ClassDef_kind; p->v.ClassDef.name = name; p->v.ClassDef.bases = bases; p->v.ClassDef.body = body; return p; } From neal@metaslash.com Fri Apr 19 14:53:31 2002 From: neal@metaslash.com (Neal Norwitz) Date: Fri, 19 Apr 2002 09:53:31 -0400 Subject: [Compiler-sig] Error checking macros References: <3CC0169F.A8BB29B0@metaslash.com> <15552.6601.918072.944843@12-248-41-177.client.attbi.com> Message-ID: <3CC0215B.281E627E@metaslash.com> I forgot to mention that using the macros drops the lines of code from 1061 to 646, about 40%. Neal From bckfnn@worldonline.dk Fri Apr 19 15:33:49 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Fri, 19 Apr 2002 14:33:49 GMT Subject: [Compiler-sig] AST observations In-Reply-To: <20020419072652.B30822@ecn> References: <20020417230224.A21385@ecn> <15551.4365.375570.60402@slothrop.zope.com> <20020419072652.B30822@ecn> Message-ID: <3cc02a9c.23311390@mail.wanadoo.dk> [Jeremy] > It might be clearer to merge the walker and the visitor into a single > class using inheritance. (I think the Walkabout variant described by > Palsberg and Jay does this, > cf. http://citeseer.nj.nec.com/palsberg97essence.html.) But I > thought delegation would be clearer and would avoid the need for a > magic base class that all visitors must inherit from. [Eric] >The only advantage I can see for this approach is faster visitation: >the base class could have default visit methods that would know how to >iterate over the child nodes. getChildNodes() would no longer be >necessary. You would not need a base class to get that benefit. I proposed a 'traverse' method on the AST nodes that will iterator over the children of the node like this: class Module: def traverse(self, walker): for stmt in self.body: walker.visit(stmt) That way the information about children is kept in the AST nodes. You would only need the visitor base class to please Jython. regards, finn From bckfnn@worldonline.dk Fri Apr 19 15:41:37 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Fri, 19 Apr 2002 14:41:37 GMT Subject: [Compiler-sig] AST observations In-Reply-To: <15551.4365.375570.60402@slothrop.zope.com> References: <20020417230224.A21385@ecn> <15551.4365.375570.60402@slothrop.zope.com> Message-ID: <3cc02ad4.23366799@mail.wanadoo.dk> [Jeremy] >It might be clearer to merge the walker and the visitor into a single >class using inheritance. (I think the Walkabout variant described by >Palsberg and Jay does this, > cf. http://citeseer.nj.nec.com/palsberg97essence.html.) Yes, and so does the the Visitor pattern they describe in 2.3. Based on the performance measurement in chapter 4, at least I hope you understand why I argue for a static double dispatch Visitor instead of a dynamic dispatching Walkabout pattern. The added flexibility of dynamic dispatch is pure YAGNI for me. Since I can control the code generated for each AST node, it would be plain wrong not to add an 'accept' method. regards, finn From bckfnn@worldonline.dk Fri Apr 19 15:43:36 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Fri, 19 Apr 2002 14:43:36 GMT Subject: [Compiler-sig] Number classes Message-ID: <3cc02c99.23820542@mail.wanadoo.dk> Hi, I would like to seperate the number types in some way. I have used the patch below in jython but adding a flag to the Num() ctor would also be fine. I think this is needed because all my codegens need to generate different code for each of the number types. Also, I have the information available from my lexer and it would be a pity to loose the info just to recreate it again in the visitNum() method. | Call(expr func, expr* args, keyword* keywords, expr? starargs, expr? kwargs) | Repr(expr value) - | Num(string n) -- string representation of a number + | IntNum(string n) -- string representation of a integer + | FloatNum(string n) -- string representation of a float + | LongNum(string n) -- string representation of a long + | ComplexNum(string n) -- string representation of a complex | Str(string s) -- need to specify raw, unicode, etc? -- other literals? bools? ['Float' and 'Long' are unfortunate classnames in java] I think the compiler package also have the number type available as the type of the Const() argument and that solution would be even better but I don't know how to express that in asdl. Could we invent an anonymous 'object' builtin asdl type? Thoughts? regards, finn From jeremy@zope.com Fri Apr 19 15:42:31 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Fri, 19 Apr 2002 10:42:31 -0400 Subject: [Compiler-sig] Error checking macros In-Reply-To: <3CC0215B.281E627E@metaslash.com> References: <3CC0169F.A8BB29B0@metaslash.com> <15552.6601.918072.944843@12-248-41-177.client.attbi.com> <3CC0215B.281E627E@metaslash.com> Message-ID: <15552.11479.408447.247737@slothrop.zope.com> I think the macros are probably a good idea here. In general I don't like macros that hide a return, because it obscures the control flow. On the third hand, this is all generated code so it doesn't actually matter what it looks like. You've probably noticed, though, that I skipped error checking almost completely in the astmodule. I'm worried about how much longer the code will be in *that* module. Jeremy From jeremy@zope.com Fri Apr 19 15:59:39 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Fri, 19 Apr 2002 10:59:39 -0400 Subject: [Compiler-sig] AST observations In-Reply-To: <3cc02ad4.23366799@mail.wanadoo.dk> References: <20020417230224.A21385@ecn> <15551.4365.375570.60402@slothrop.zope.com> <3cc02ad4.23366799@mail.wanadoo.dk> Message-ID: <15552.12507.204189.483828@slothrop.zope.com> >>>>> "FB" == Finn Bock writes: FB> [Jeremy] >> It might be clearer to merge the walker and the visitor into a >> single class using inheritance. (I think the Walkabout variant >> described by Palsberg and Jay does this, >> cf. http://citeseer.nj.nec.com/palsberg97essence.html.) FB> Yes, and so does the the Visitor pattern they describe in FB> 2.3. Based on the performance measurement in chapter 4, at least FB> I hope you understand why I argue for a static double dispatch FB> Visitor instead of a dynamic dispatching Walkabout pattern. The FB> added flexibility of dynamic dispatch is pure YAGNI for FB> me. Since I can control the code generated for each AST node, it FB> would be plain wrong not to add an 'accept' method. Yes, indeed. I wasn't trying to do anything efficiently, and I definitely did not care about Java performance when I wrote all the code. For use in the core of Jython or CPython, however, performance is an important consideration. It makes complete sense to generate all the visitor dispatch code statically for Java and C. (I wonder how much performance difference it makes for 100% pure Python.) Jeremy From jeremy@zope.com Fri Apr 19 16:10:39 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Fri, 19 Apr 2002 11:10:39 -0400 Subject: [Compiler-sig] Number classes In-Reply-To: <3cc02c99.23820542@mail.wanadoo.dk> References: <3cc02c99.23820542@mail.wanadoo.dk> Message-ID: <15552.13167.585346.444380@slothrop.zope.com> For CPython, I've got a single routine called parsenumber(). It converts a string to a PyObject * of the appropriate type. So for my code, the only way I could find out if I've got an int vs. a complex is to parse it and check the return type. But once I've got the PyObject *, there's no need to pass a string to ctor. Is the 'object' type your thinking of for ASDL the generic object type of the Python implementation? So I would actually pass the ComplexNum ctor a PyObject *? For the code generator, the various number objects are all treated the same way once they're parsed. (Unless the compiler was doing some constant folding, I suppose.) If there's no difference to the way the numbers are handled, it would be better to mark a generic Num type with some flags or attributes that provide the extra info. Jeremy From neal@metaslash.com Fri Apr 19 16:44:57 2002 From: neal@metaslash.com (Neal Norwitz) Date: Fri, 19 Apr 2002 11:44:57 -0400 Subject: [Compiler-sig] Bug in astmodule? Message-ID: <3CC03B79.AFF48E1F@metaslash.com> if (strcmp(STR(CHILD(n, 1)), "in") == 0) return NotIn; if (strcmp(STR(CHILD(n, 0)), "is") == 0) return IsNot; Shouldn't the 2nd if, do CHILD(n, 1) like the first? Neal From bckfnn@worldonline.dk Fri Apr 19 16:56:38 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Fri, 19 Apr 2002 15:56:38 GMT Subject: [Compiler-sig] Number classes In-Reply-To: <15552.13167.585346.444380@slothrop.zope.com> References: <3cc02c99.23820542@mail.wanadoo.dk> <15552.13167.585346.444380@slothrop.zope.com> Message-ID: <3cc03743.26549916@mail.wanadoo.dk> [Jeremy] >For CPython, I've got a single routine called parsenumber(). It >converts a string to a PyObject * of the appropriate type. So for my >code, the only way I could find out if I've got an int vs. a complex >is to parse it and check the return type. But once I've got the >PyObject *, there's no need to pass a string to ctor. > >Is the 'object' type your thinking of for ASDL the generic object type >of the Python implementation? So I would actually pass the ComplexNum >ctor a PyObject *? Yes, that was the idea, but then there would only be one Num(object n) ctor. Is it a useless brainfart? >For the code generator, the various number objects are all treated the >same way once they're parsed. (Unless the compiler was doing some >constant folding, I suppose.) Interesting. I need to do something different for each num type: public Object visitIntNum(IntNum node) throws Exception { module.PyInteger(Integer.parseInt(node.n)).get(code); return null; } public Object visitLongNum(LongNum node) throws Exception { module.PyLong(node.n).get(code); return null; } >If there's no difference to the way the >numbers are handled, it would be better to mark a generic Num type >with some flags or attributes that provide the extra info. That will work for me too, but as I understand your first paragraph above, CPython don't know the type without parsing the string. regards, finn From jeremy@zope.com Fri Apr 19 17:58:39 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Fri, 19 Apr 2002 12:58:39 -0400 Subject: [Compiler-sig] Number classes In-Reply-To: <3cc03743.26549916@mail.wanadoo.dk> References: <3cc02c99.23820542@mail.wanadoo.dk> <15552.13167.585346.444380@slothrop.zope.com> <3cc03743.26549916@mail.wanadoo.dk> Message-ID: <15552.19647.972709.314308@slothrop.zope.com> >>>>> "FB" == Finn Bock writes: >> If there's no difference to the way the numbers are handled, it >> would be better to mark a generic Num type with some flags or >> attributes that provide the extra info. FB> That will work for me too, but as I understand your first FB> paragraph above, CPython don't know the type without parsing the FB> string. If we have expr = Num(object value, num_type type) num_type = Int | Long | Float | Complex Then I can parse the string and create a num passing the PyObject * and setting the appropriate type flag. I could also handle separate IntNum, LongNum, etc. ctors, but that seems like more nodes that we really need. If the Num node(s) need to have the type specified, that I'd like it to take an object not a string. Jeremy From jeremy@zope.com Fri Apr 19 18:19:48 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Fri, 19 Apr 2002 13:19:48 -0400 Subject: [Compiler-sig] Bug in astmodule? In-Reply-To: <3CC03B79.AFF48E1F@metaslash.com> References: <3CC03B79.AFF48E1F@metaslash.com> Message-ID: <15552.20916.228865.907994@slothrop.zope.com> >>>>> "NN" == Neal Norwitz writes: NN> if (strcmp(STR(CHILD(n, 1)), "in") == 0) NN> return NotIn; NN> if (strcmp(STR(CHILD(n, 0)), "is") == 0) NN> return IsNot; NN> Shouldn't the 2nd if, do CHILD(n, 1) like the first? No. One is checking "not in" and the other is "is not". I'll add a comment, though. Jeremy From neal@metaslash.com Fri Apr 19 18:28:12 2002 From: neal@metaslash.com (Neal Norwitz) Date: Fri, 19 Apr 2002 13:28:12 -0400 Subject: [Compiler-sig] Bug in astmodule? References: <3CC03B79.AFF48E1F@metaslash.com> <15552.20916.228865.907994@slothrop.zope.com> Message-ID: <3CC053AC.13306123@metaslash.com> Jeremy Hylton wrote: > > >>>>> "NN" == Neal Norwitz writes: > > NN> if (strcmp(STR(CHILD(n, 1)), "in") == 0) > NN> return NotIn; > NN> if (strcmp(STR(CHILD(n, 0)), "is") == 0) > NN> return IsNot; > > NN> Shouldn't the 2nd if, do CHILD(n, 1) like the first? > > No. One is checking "not in" and the other is "is not". Oops, I obviously wasn't thinking. It makes perfect sense. Neal From bckfnn@worldonline.dk Fri Apr 19 19:05:37 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Fri, 19 Apr 2002 18:05:37 GMT Subject: [Compiler-sig] Number classes In-Reply-To: <15552.19647.972709.314308@slothrop.zope.com> References: <3cc02c99.23820542@mail.wanadoo.dk> <15552.13167.585346.444380@slothrop.zope.com> <3cc03743.26549916@mail.wanadoo.dk> <15552.19647.972709.314308@slothrop.zope.com> Message-ID: <3cc059cd.35392381@mail.wanadoo.dk> [Jeremy] >If we have > >expr = Num(object value, num_type type) >num_type = Int | Long | Float | Complex > >Then I can parse the string and create a num passing the PyObject * >and setting the appropriate type flag. I could also handle separate >IntNum, LongNum, etc. ctors, but that seems like more nodes that we >really need. > >If the Num node(s) need to have the type specified, that I'd like it >to take an object not a string. I think this is a little overkill. I only needed the 'type' arg when the value was a string. If we decide to use a 'object value' I have no need for the type argument. So this is sufficient for me: expr = Num(object value) and I'm guessing it is for you too. I just implemented the 'object' type and the only drawback is that the parser package now depends on the classes in the core package. It is a little ugly but it is in no way a technical problem. regards, finn From bckfnn@worldonline.dk Sun Apr 21 13:49:02 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Sun, 21 Apr 2002 12:49:02 GMT Subject: [Compiler-sig] More Jython progress Message-ID: <3cc2aa24.4619202@mail.wanadoo.dk> Hi, Based on the current AST tree and a modified CodeCompiler, I can now generate javabytecode. I'm sure there are still a few bugs in the generated code but it passes our initial (rather small) test suite. The patch to current jython CVS is available here: : http://sourceforge.net/tracker/?func=detail&atid=312867&aid=546737&group_id=12867 The next phase for Jython is to port jythonc (the java sourcecode generator) to use the new AST tree. Some observations about the AST: I wonder if it would make sense to map the 'identifier' type to a Name node. In some cases I create a Name node based on an identifier just so it can play part of the visitor. From the end of visitFunctionDef(): set(new Name(node.name, Name.Store)); Since the function name was initially parsed as a Name node anyway I think it would be better to maintain the original Name node in the FunctionDef node. The ListComp and the way I uses it bugs me a little. I'll admit it is a clever way of representing a listcomp but I have been reusing the visitFor() and visitIf() methods to generate the loop and branching code. Since I wanted to continue to do that I builds a series of For() and If() statements from the listcomp: set(new Name(tmp_append, Name.Store)); stmtType n = new Expr(new Call(new Name(tmp_append, Name.Load), new exprType[] { node.target }, new keywordType[0], null, null)); for (int i = node.generators.length - 1; i >= 0; i--) { listcompType lc = node.generators[i]; for (int j = lc.ifs.length - 1; j >= 0; j--) { n = new If(lc.ifs[j], new stmtType[] { n }, null); } n = new For(lc.target, lc.iter, new stmtType[] { n }, null); } visit(n); visit(new Delete(new exprType[] { new Name(tmp_append, Name.Del) })); I think it is quite clever to reuse the For() and If() codegen but I don't like to create new nodes on the fly. These new nodes will not have the right linenumbers and whatever other additional information that we attach to the nodes. I would rather prefer that the For() and If() nodes was stored in the ListComp node. regards, finn From bckfnn@worldonline.dk Sun Apr 21 16:20:08 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Sun, 21 Apr 2002 15:20:08 GMT Subject: [Compiler-sig] Re: [Python-checkins] python/nondist/sandbox/ast asdl.h,1.4,1.5 astmodule.c,1.5,1.6 In-Reply-To: References: Message-ID: <3cc2d51f.15622413@mail.wanadoo.dk> [Neal in a checkin message] >Get build working again, Thanks Neal, I'm sorry for breaking the build deliberately without supplying a fix. regards, finn From neal@metaslash.com Sun Apr 21 16:30:52 2002 From: neal@metaslash.com (Neal Norwitz) Date: Sun, 21 Apr 2002 11:30:52 -0400 Subject: [Compiler-sig] Re: [Python-checkins] python/nondist/sandbox/ast asdl.h,1.4,1.5 astmodule.c,1.5,1.6 References: <3cc2d51f.15622413@mail.wanadoo.dk> Message-ID: <3CC2DB2C.3D6ADCA@metaslash.com> Finn Bock wrote: > > [Neal in a checkin message] > > >Get build working again, > > Thanks Neal, I'm sorry for breaking the build deliberately without > supplying a fix. No problem. I'm glad we can make progress on both CPython & Jython. Neal From jeremy@zope.com Sun Apr 21 17:49:32 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Sun, 21 Apr 2002 12:49:32 -0400 Subject: [Compiler-sig] Re: [Python-checkins] python/nondist/sandbox/ast asdl.h,1.4,1.5 astmodule.c,1.5,1.6 In-Reply-To: References: Message-ID: <15554.60828.475903.594444@slothrop.zope.com> >>>>> "NN" == nnorwitz writes: NN> Add XXX question about why we are using #define rather than NN> typedef I've got an uncommitted change for this, too. I don't know why it's a define. (Who wrote that code? ;-) Jeremy From jeremy@zope.com Mon Apr 22 05:31:53 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Mon, 22 Apr 2002 00:31:53 -0400 Subject: [Compiler-sig] More Jython progress In-Reply-To: <3cc2aa24.4619202@mail.wanadoo.dk> References: <3cc2aa24.4619202@mail.wanadoo.dk> Message-ID: <15555.37433.560666.702582@slothrop.zope.com> >>>>> "FB" == Finn Bock writes: FB> Hi, Based on the current AST tree and a modified CodeCompiler, I FB> can now generate javabytecode. I'm sure there are still a few FB> bugs in the generated code but it passes our initial (rather FB> small) test suite. FB> The patch to current jython CVS is available here: FB> : FB> http://sourceforge.net/tracker/?func=detail&atid=312867&aid=546737&group_id=12867 FB> The next phase for Jython is to port jythonc (the java FB> sourcecode generator) to use the new AST tree. I had hoped to look over your code this weekend, but didn't get to it. The subtleties of converting list comprehensions delayed me <0.6 wink>. Is it your intent to re-do the compiler(s) in Jython? In hindsight, it seems clear that you weren't doing this just to kill time, but I didn't realize that both Pythons were in for a compiler overhaul at the same time. FB> Some observations about the AST: I'll have to think about these tomorrow. I hope it's not too much trouble that I changed Dict. FB> The ListComp and the way I uses it bugs me a little. I'll admit FB> it is a clever way of representing a listcomp but I have been FB> reusing the visitFor() and visitIf() methods to generate the FB> loop and branching code. Since I wanted to continue to do that I FB> builds a series of For() and If() statements from the listcomp: I just looked at the compiler package and saw that it's visitFor() and visitListFor() are quite similar. The visitIf() and visitListIf() aren't very similar, presumably because a lot of logic is in the visitListComp() method. The compiler package uses a ListComp() object with two children -- a binding expression and a list of ListCompFor and ListCompIf nodes.) I'd need to think harder about how the two kinds of fors and ifs could be merged here. Perhaps you could accomplish this with helper methods instead of creating throwaway nodes? _visit_generic_for() that could be called be either visitFor() or visitlistcomp()? Jeremy From jeremy@zope.com Mon Apr 22 05:35:46 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Mon, 22 Apr 2002 00:35:46 -0400 Subject: [Compiler-sig] sharing AST between C and Python Message-ID: <15555.37666.880198.190043@slothrop.zope.com> Has anyone given thought about how to share an AST between the Python core and user code written in Python? I think I mentioned earlier that I was leaning towards an explicit "pickling" phases to copy an AST across the boundary rather than trying to share references to a single struct. Does anyone else have an opinion? I'm asking because the first draft of astmodule.c is winding down, and I'll need the pickler soon if I want to do any noodling with the converted AST. Jeremy PS That's pickling in the ASDL sense, which is similar to but not the same as pickling in the Python sense. From bckfnn@worldonline.dk Mon Apr 22 11:37:14 2002 From: bckfnn@worldonline.dk (Finn Bock) Date: Mon, 22 Apr 2002 10:37:14 GMT Subject: [Compiler-sig] More Jython progress In-Reply-To: <15555.37433.560666.702582@slothrop.zope.com> References: <3cc2aa24.4619202@mail.wanadoo.dk> <15555.37433.560666.702582@slothrop.zope.com> Message-ID: <3cc3d9de.2059691@mail.wanadoo.dk> >>>>>> "FB" == Finn Bock writes: > > FB> Hi, Based on the current AST tree and a modified CodeCompiler, I > FB> can now generate javabytecode. I'm sure there are still a few > FB> bugs in the generated code but it passes our initial (rather > FB> small) test suite. > > FB> The patch to current jython CVS is available here: > > FB> : > FB> http://sourceforge.net/tracker/?func=detail&atid=312867&aid=546737&group_id=12867 > > FB> The next phase for Jython is to port jythonc (the java > FB> sourcecode generator) to use the new AST tree. [Jeremy] >I had hoped to look over your code this weekend, but didn't get to >it. The subtleties of converting list comprehensions delayed me <0.6 >wink>. Is it your intent to re-do the compiler(s) in Jython? Yes. One is done already, still one compiler to do. If you want to look, the AST is created by org.python.p2.TreeBuilder which is called by the actions specified in the JavaCC grammar. The compiler are located in org.python.c2.CodeCompiler. In the c2 package there are also a ScopesCompiler that handling the symbol types (fast_locals, cells, etc) and an ArgListCompiler that deals with default argument values and argtuple unpacking. All three compiler classes are using the visitor pattern I outlined a while ago. >In >hindsight, it seems clear that you weren't doing this just to kill >time, but I didn't realize that both Pythons were in for a compiler >overhaul at the same time. The size of the overhaul is significantly smaller for jython. Our old syntax tree was almost node-by-node exactly the same as the new AST but all the children was anonymous. Except for a few smaller differences (like listcomp and function arguments) the transformation have been straightforward. The main reason I wanted to switch the new AST is because we have to create yet another AST visitor, one that does on the fly interpretation of the python code. I did not want to start on this visitor using the old tree, I guessed it would be faster to switch the other compilers to the new AST instead. > FB> Some observations about the AST: > >I'll have to think about these tomorrow. I hope it's not too much >trouble that I changed Dict. The new way of representing Dict keys and values is rather unnatural for jython because all the existing support functions and PyDictionary() ctors assume that the elements are alternating keys and values. It is not a big issue and I don't think the slowdown of rearranging the elements will be noticable. > FB> The ListComp and the way I uses it bugs me a little. I'll admit > FB> it is a clever way of representing a listcomp but I have been > FB> reusing the visitFor() and visitIf() methods to generate the > FB> loop and branching code. Since I wanted to continue to do that I > FB> builds a series of For() and If() statements from the listcomp: > >I just looked at the compiler package and saw that it's visitFor() and >visitListFor() are quite similar. The visitIf() and visitListIf() >aren't very similar, presumably because a lot of logic is in the >visitListComp() method. > > The compiler package uses a ListComp() object with two children -- > a binding expression and a list of ListCompFor and ListCompIf > nodes.) > >I'd need to think harder about how the two kinds of fors and ifs could >be merged here. Perhaps you could accomplish this with helper methods >instead of creating throwaway nodes? _visit_generic_for() that could >be called be either visitFor() or visitlistcomp()? I'll think more about it, but the main problem is that visitFor() and visitIf() are using recursion while a ListComp is a sequence. regards, finn From jeremy@zope.com Wed Apr 24 00:12:34 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Tue, 23 Apr 2002 19:12:34 -0400 Subject: [Compiler-sig] More Jython progress In-Reply-To: <3cc3d9de.2059691@mail.wanadoo.dk> References: <3cc2aa24.4619202@mail.wanadoo.dk> <15555.37433.560666.702582@slothrop.zope.com> <3cc3d9de.2059691@mail.wanadoo.dk> Message-ID: <15557.60002.639949.876456@slothrop.zope.com> I'm afraid I won't have any more time to work on this until the end of the week. A bunch of customer-related projects appeared this week, and I need to devote some time to them. Jeremy PS If anyone else wants to extend astmodule.c, be my guess.