[summerofcode] High-level python optimizer.

Wed Jun 8 17:08:19 CEST 2005

Hello !
This is my project proposal: High-level python optimizer.

1. Pre-history.
Python optimization is difficult task because python's dynamic nature. 
Applying many of optimization techniques used in other (static typed) 
languages to python is almost impossible. There was an attempt to create 
python optimizer that works on the byte-code 
(http://www.foretec.com/python/workshops/1998-11/proceedings/papers/montanaro/montanaro.html), 
but it speeds-up execution only by a few percent.
Different approach is taken in the psyco project. It implements JIT 
compilation using run-time specialization. The results is very impressive. 
But psyco does only simple local run-time optimizations, it can't look at the 
whole program.

2. Project proposal.
My proposal is to create high-level optimizer, that will work on the source 
tree of the whole program. It will transform the source tree to data-flow 
graph that represents the flow of the data from program input (in general 
sence) to output. The goal is elimination of all unnecessarily dynamics i.e. 
making assumptions about actual types (and may be values) of variables and 
their life-times. Of course this can't be done for any variable in the 
program, but (usually) for most of them (just try to open any python program 
and traverse it from the very beginning visiting any called function). Once 
we have got this information we can apply a lot of traditional well-known 
optimization techniques as well as python-specific ones. Then we can compile 
and run optimized tree, or transform it back to python program. We can still 
use psyco for JIT, and maybe we can even provide psyco with gathered 
information about variable types.

3. Implementation notes.
This approach for optimization will be the most efficient if applied to 
complete python program (not the stand-alone library) and if the sources of 
all used modules will be available to optimizer. The former is needed to 
extract the maximum information from input data, stand-alone modules can be 
optimized too if it's acceptable input types are somehow specified. The 
source of used modules is needed to extract information from used functions 
such as possible return types, exceptions and side-effects and to make 
partial evaluated version of module's functions. To work better with C 
modules one can specify needed information about module's function to the 
optimizer.

So the actual implementation of such optimizer is divided into three parts:
(a) creating general framework that will handle building data-flow graph, 
gathering information and then applying (pluggable) optimizer
(b) Writing optimizers
(c) describing information about standard modules written in C.
Part (a) is the most important and interesting part of the project, parts (b) 
and (c) can be always simply extended later (of course simplicity depends on 
good design of (a)).

If someone from PSF agreed to be my mentor, please contact me.

-- 
                  Best regards,
                     Vladimir