[Python-Dev] Optimization of Python ASTs: How should we deal with constant values?

Thomas Lee tom at vector-seven.com
Fri May 9 01:22:18 CEST 2008


Nick Coghlan wrote:
>
> There are a lot of micro-optimisations that are actually context 
> independent, so moving them before the symtable pass should be quite 
> feasible - e.g. replacing "return None" with "return", stripping dead 
> code after a return statement, changing a "if not" statement into an 
> "if" statement with the two suites reversed, changing "(1, 2, 3)" into 
> a stored constant, folding "1 + 2" into the constant "3".
>
> I believe the goal is to see how many of the current bytecode 
> optimisations can actually be brought forward to the AST generation 
> stage, rather than waiting until after the bytecode symtable 
> calculation and compilation passes.
>
That's been the aim so far. It's been largely successful with the 
exception of a few edge cases (most notably the functions vs. generator 
stuff). The elimination of unreachable paths (whether they be things 
like "if 0: ..." or "return; ... more code ...") completely breaks 
generators since we might potentially be blowing away "yield" statements 
during the elimination process.

The rest of the optimizations, as far as I can see, are much less scary.
> The current structure goes:
>
> tokenisation->AST construction->symtable construction->bytecode 
> compilation->bytecode optimisation
>
> My understanding of what Thomas is trying to do is to make it look 
> more like this:
>
> tokenisation->AST construction->AST optimisation->symtable 
> construction->bytecode compilation
>
That's exactly right.

I made a quick and dirty attempt at moving the AST optimization step 
after the symtable generation on the train home last night to see if 
Jeremy's suggestion gives us anything. Now, it does make the detection 
of generators a little easier, but it really doesn't give us all that 
much else. I'm happy to post a patch so you can see what I mean, but for 
now I think visiting the FunctionDef subtree to check for Yield nodes 
will be fine.

Again, I might be wrong (as I've often been throughout this process!) 
but let's see how it goes. Obviously a bit less efficient, but function 
bodies really shouldn't be all that deep anyway. I've got a good idea 
about how I'm going to go forward with this.

Cheers,
T



More information about the Python-Dev mailing list