[Python-Dev] Speeding up CPython 5-10%

Wed Jan 27 15:10:03 EST 2016

Hi Yuri,

I think these are great ideas to speed up CPython.  They are probably
the simplest yet most effective ways to get performance improvements
in the VM.

MicroPython has had LOAD_METHOD/CALL_METHOD from the start (inspired
by PyPy, and the main reason to have it is because you don't need to
allocate on the heap when doing a simple method call).  The specific
opcodes are:

LOAD_METHOD # same behaviour as you propose
CALL_METHOD # for calls with positional and/or keyword args
CALL_METHOD_VAR_KW # for calls with one or both of */**

We also have LOAD_ATTR, CALL_FUNCTION and CALL_FUNCTION_VAR_KW for
non-method calls.

MicroPython also has dictionary lookup caching, but it's a bit
different to your proposal.  We do something much simpler: each opcode
that has a cache ability (eg LOAD_GLOBAL, STORE_GLOBAL, LOAD_ATTR,
etc) includes a single byte in the opcode which is an offset-guess
into the dictionary to find the desired element.  Eg for LOAD_GLOBAL
we have (pseudo code):

CASE(LOAD_GLOBAL):
key = DECODE_KEY;
offset_guess = DECODE_BYTE;
if (global_dict[offset_guess].key == key) {
    // found the element straight away
} else {
    // not found, do a full lookup and save the offset
    offset_guess = dict_lookup(global_dict, key);
    UPDATE_BYTECODE(offset_guess);
}
PUSH(global_dict[offset_guess].elem);

We have found that such caching gives a massive performance increase,
on the order of 20%.  The issue (for us) is that it increases bytecode
size by a considerable amount, requires writeable bytecode, and can be
non-deterministic in terms of lookup time.  Those things are important
in the embedded world, but not so much on the desktop.

Good luck with it!

Regards,
Damien.