[Python-Dev] Python 3 optimizations continued...

stefan brunthaler s.brunthaler at uci.edu
Mon Aug 29 20:33:14 CEST 2011


> Perhaps there would be something to say given patches/overviews/specifics.
>
Currently I don't have patches, but for an overview and specifics, I
can provide the following:
* My optimizations basically rely on quickening to incorporate
run-time information.
* I use two separate instruction dispatch routines, and use profiling
to switch from the regular Python 3 dispatch routine to an optimized
one (the implementation is actually vice versa, but that is not
important now)
* The optimized dispatch routine has a changed instruction format
(word-sized instead of bytecodes) that allows for regular instruction
decoding (without the HAS_ARG-check) and inlinind of some objects in
the instruction format on 64bit architectures.
* I use inline-caching based on quickening (passes almost all
regression tests [302 out of 307]), eliminate reference count
operations using quickening (passes but has a memory leak), promote
frequently accessed local variables to their dedicated instructions
(passes), and cache LOAD_GLOBAL/LOAD_NAME objects in the instruction
encoding when possible (I am working on this right now.)

The changes I made can be summarized as:
* I changed some header files to accommodate additional information
(Python.h, ceval.h, code.h, frameobject.h, opcode.h, tupleobject.h)
* I changed mostly abstract.c to incorporate runtime-type feedback.
* All other changes target mostly ceval.c and all supplementary code
is in a sub-directory named "opt" and all generated files in a
sub-directory within that ("opt/gen").
* I have a code generator in place that takes care of generating all
the functions; it uses the Mako template system for creating C code
and does not necessarily need to be shipped with the interpreter
(though one can play around and experiment with it.)

So, all in all, the changes are not that big to the actual
implementation, and most of the code is generated (using sloccount,
opt has 1990 lines of C, and opt/gen has 8649 lines of C).

That's a quick summary, if there are any further or more in-depth
questions, let me know.

best,
--stefan


More information about the Python-Dev mailing list