[Python-Dev] Python 3 optimizations...
Stefan Behnel
stefan_ml at behnel.de
Fri Jul 23 10:38:32 CEST 2010
stefan brunthaler, 23.07.2010 08:48:
> I guess it would be a good idea to quickly outline my inline caching
> approach, so that we all have a basic understanding of how it works.
Yes, that certainly makes it easier to discuss.
> If we take for instance the BINARY_ADD instruction, the interpreter
> evaluates the actual operand types and chooses the matching operation
> implementation at runtime, i.e., operands that are unicode strings
> will be concatenated via unicode_concatenate, for float operands on
> the other hand, the interpreter would end up invoking float_add via
> binary_op1. Now, a very efficient way to achieve purely interpretative
> inline caching is to quicken the type-generic BINARY_ADD instruction
> to a type-dependent FLOAT_ADD instruction (this technique, i.e.,
> inline caching via quickening, is the primary contribution of my ECOOP
> paper). Hence, I have a very simple code generator, that generates
> type-dependent interpreter instructions in a pre-compile step of the
> interpreter, and uses runtime type information to quicken/rewrite
> instructions.
> Aside of the operators, I have implemented this quickening technique
> for FOR_ITER, COMPARE_OP and CALL_FUNCTION instructions.
This sounds like wpython (a CPython derivative with a wider set of byte
code commands) could benefit from it.
Do I understand correctly that you modify the byte code of
modules/functions at runtime?
>> I'm absolutely interested, although not for the CPython project but for
>> Cython. I wonder how you do inline caching in Python if the methods of a
>> type can be replaced by whatever at runtime. Could you elaborate on that?
>
> Currently, I only provide optimized derivatives for several separate
> call targets, i.e., whether a call target is a C function with
> varargs, or a Python function/method--this already eliminates a lot of
> overhead from invoking call_function.
Ah, yes, that makes good sense. So you basically add an intermediate step
to calls that provides faster dispatch for known C functions.
>> Or do you restrict yourself to builtin types?
>
> Currently, my approach provides optimized derivative instructions for
> the standard library, e.g., unicode strings, numerical objects,
> containers, and iterators.
I'm interested in the code that determines what can be optimised in what
way. I read that Jython recently received a contribution that provides type
information for lots of modules and builtins, but having something like
that for CPython would be cool.
>> That might be worth it
>> already, just think of list.append(). We have an optimistic optimisation for
>> object.append() in Cython that gives us massive speed-ups in loops that
>> build lists, even if we don't know at compile time that we are dealing with
>> lists.
>>
> Yes, that sounds like a reasonable thing to do. I could provide much
> more optimized derivatives based on application profiles, too. Since I
> use a simple code generator for generating the derivatives, it would
> also be possible to provide end-users with the means to analyze their
> apps and generate optimized instruction derivatives matching their
> profile.
Such an approach would also be very useful for Cython. Think of a profiler
that runs a program in CPython and tells you exactly what static type
annotations to put where in your Python code to make it compile to a fast
binary with Cython. Or, even better, it could just spit out a .pxd file
that you drop next to your .py file and that provides the static type
information for you.
Stefan
More information about the Python-Dev
mailing list