[summerofcode] High-level python optimizer.

Mark Dufour mark.dufour at gmail.com
Fri Jun 10 15:39:44 CEST 2005


Hi Vova,

You're right, conversion to C++ cannot work for many Python programs.
But how many programs are written in C++, as opposed to Python? -^ My
aim is to be able to use Python as a 'high-level C++', so that you
both get a nice syntax, as well as very efficient code (of course
programmers need to realize what kind of compiler they are dealing
with, as always..)

I am just worried that if you plan on doing a _global_ dataflow
analysis, it will be difficult to remain precise, when all sort of
dynamic things happen in the code. I'm not sure, but I think the
people of PyPy are already working heavily on the kinds of
optimizations as you are talking about, but applying it during
run-time, so they can be somewhat precise for dynamic code - ie work
for any Python program (Armin? :)) I guess it would be cool if you
could join their efforts, maybe by submitting a proposal for
implementing a certain feature?

As for my type inference code, it will probably not be very stable for
at least a few more months. Of course, instead of generating C++ code,
it could be used for other purposes. Hm, if you would like to get a
'good' overview of the theoretical work done on type inference for
dynamic languages, you might check out my survey at
http://kascade.org/optimizing_python.pdf. I spent a lot of time going
through the literature..


Mark.

On 6/10/05, Vova <physics at list.ru> wrote:
> Hello !
> Again, just to clarify. My program will do global analysis of the source
> building data-flow graph. This graph makes possible to predict types and
> values of some variables and check functions for side-effects (depending on
> argument types). Using this information optimizer can make some good
> optimizations. Being well implemented it gives the following guarantees:
> 1. This will work for ANY python program.
> 2. Optimized program will always work the same as the original program.
> Optimizer will touch only parts that it can optimize. By supplying optimizer
> with some user-created information results can be improved, but this
> information is not required.
> 
> The aims of this optimizer is to:
> 1. Allow programmers not to sacrifice readability and architecture of the code
> in favor of performance.
> 2. Allow programmers to use some nice language features (functional
> programming, for example) not thinking that this features are inefficient.
> 3. Allow improving old programs using new language features (for example using
> generators instead of lists).
> 4. Discover more about program-transformation algorithms.
> 
> Compiling code to C++ does not give the same guarantees. It will work only for
> static programs, only for subset of python. How many real-world programs are
> in this subset ? But of course seep-up of such compilation will be greater. I
> think the aim of such compiler is to allow using static subset of python
> syntax to write fast programs (am I wrong?).
> 
> Our projects is just different ones, however some correlations of course
> present. May be we can even help each other in overlapping areas.
> 
> --
>           Best regards,
>             Vladimir
> 
> On Friday 10 June 2005 13:44, Mark Dufour wrote:
> > Hi Vova,
> >
> > It's very hard to prove anything about a program without a precise
> > type analysis. As Armin mentioned, implementing this alone, for
> > Python, represents quite an effort. But it is the most important thing
> > for optimizing pure Python: you can't specify types in Python (without
> > messing up your source code, or using Pyrex), but you can optimize
> > other things by hand. In my case, it's even more important, because
> > conversion to C++ essentially gives you most of the optimizations you
> > mention for free.  A really high-level optimization that remains in
> > this case is for example trying to allocate things on the stack, or
> > even statically preallocate them (because you do this manually in
> > C++). Other examples of high-level optimizations imo are for example
> > optimization of bounds checking, optimizing memory layout in many
> > ways, and removing 'intermediate lists' as in functional language
> > compilers.
> >
> > But, I don't think a global approach can work well for even moderately
> > dynamic programs, as it's very hard to determine precisely what is
> > going on. (Consider just the use of reflection - it's formally
> > undecidable what kind of attribute names are involved, even without
> > user input.) A global approach then, can imo best consider a subset of
> > Python, and translate code into a language such as C++ (as it operates
> > at the same abstraction level as Python, and you get many
> > optimizations for free.) I really like the elegance of such a
> > solution: you can reuse lots of work done in for example GCC, and you
> > get the ultimate in speed for relatively 'static' programs - which
> > many people, or at least myself, mostly write. I love the syntax of
> > Python, but I don't really use for example dynamic typing for most of
> > the time (it's just convenient to be able to leave out type
> > declarations :D)
> >
> > On the other hand, a non-global approach I think cannot do many
> > high-level optimizations, as it doesn't have enough information or
> > (when done during run-time) does not have the time to calculate the
> > right information. A non-global, run-time approach does seem like the
> > best long-term solution. With Moore's law and all, it can optimize
> > most code to an acceptable level, and still be able to work with the
> > full Python semantics.
> >
> > So, I am left wondering a bit about the approach you would like to
> > take. If you take the global route, then why not convert to C++. But
> > that would be sort of the same thing as Starkiller and Shedskin (my
> > compiler) aim to do. If you take the non-global route, then real
> > high-level optimizations seem out of the question, or at least very
> > complicated? Maybe it would be best to look into the things Armin Rigo
> > and the others are doing with PyPy, perhaps find something you can
> > contribute there?
> >
> > Finally, I have no idea whether the source code for Starkiller is
> > available, or if so, if it's readable. The website seems gone, and I
> > don't know what happened. Well, I know my source code is available,
> > but not very readable -^ (just mail me, and I'll send you this 6000
> > line can of worms) - I'll be probably still modifying it heavily for
> > the foreseeable future..
> >
> >
> > Mark.
> >
> 


-- 
if vars: self.output('; '.join([self.nodetype(var)+' '+name for
(name,var) in vars.items()])+';')


More information about the summerofcode mailing list