[Python-Dev] Python 3 optimizations...

Fri Jul 23 10:38:32 CEST 2010

stefan brunthaler, 23.07.2010 08:48:
> I guess it would be a good idea to quickly outline my inline caching
> approach, so that we all have a basic understanding of how it works.

Yes, that certainly makes it easier to discuss.

> If we take for instance the BINARY_ADD instruction, the interpreter
> evaluates the actual operand types and chooses the matching operation
> implementation at runtime, i.e., operands that are unicode strings
> will be concatenated via unicode_concatenate, for float operands on
> the other hand, the interpreter would end up invoking float_add via
> binary_op1. Now, a very efficient way to achieve purely interpretative
> inline caching is to quicken the type-generic BINARY_ADD instruction
> to a type-dependent FLOAT_ADD instruction (this technique, i.e.,
> inline caching via quickening, is the primary contribution of my ECOOP
> paper). Hence, I have a very simple code generator, that generates
> type-dependent interpreter instructions in a pre-compile step of the
> interpreter, and uses runtime type information to quicken/rewrite
> instructions.
> Aside of the operators, I have implemented this quickening technique
> for FOR_ITER, COMPARE_OP and CALL_FUNCTION instructions.

This sounds like wpython (a CPython derivative with a wider set of byte 
code commands) could benefit from it.

Do I understand correctly that you modify the byte code of 
modules/functions at runtime?

>> I'm absolutely interested, although not for the CPython project but for
>> Cython. I wonder how you do inline caching in Python if the methods of a
>> type can be replaced by whatever at runtime. Could you elaborate on that?
>
> Currently, I only provide optimized derivatives for several separate
> call targets, i.e., whether a call target is a C function with
> varargs, or a Python function/method--this already eliminates a lot of
> overhead from invoking call_function.

Ah, yes, that makes good sense. So you basically add an intermediate step 
to calls that provides faster dispatch for known C functions.

>> Or do you restrict yourself to builtin types?
>
> Currently, my approach provides optimized derivative instructions for
> the standard library, e.g., unicode strings, numerical objects,
> containers, and iterators.

I'm interested in the code that determines what can be optimised in what 
way. I read that Jython recently received a contribution that provides type 
information for lots of modules and builtins, but having something like 
that for CPython would be cool.

>> That might be worth it
>> already, just think of list.append(). We have an optimistic optimisation for
>> object.append() in Cython that gives us massive speed-ups in loops that
>> build lists, even if we don't know at compile time that we are dealing with
>> lists.
>>
> Yes, that sounds like a reasonable thing to do. I could provide much
> more optimized derivatives based on application profiles, too. Since I
> use a simple code generator for generating the derivatives, it would
> also be possible to provide end-users with the means to analyze their
> apps and generate optimized instruction derivatives matching their
> profile.

Such an approach would also be very useful for Cython. Think of a profiler 
that runs a program in CPython and tells you exactly what static type 
annotations to put where in your Python code to make it compile to a fast 
binary with Cython. Or, even better, it could just spit out a .pxd file 
that you drop next to your .py file and that provides the static type 
information for you.

Stefan