[Python-Dev] Speeding up CPython 5-10%

Yury Selivanov yselivanov.ml at gmail.com
Fri Jan 29 10:06:38 EST 2016


Hi Damien,

BTW I just saw (and backed!) your new Kickstarter campaign
to port MicroPython to ESP8266, good stuff!

On 2016-01-29 7:38 AM, Damien George wrote:
> Hi Yury,
>
> [..]
>> Do you use opcode dictionary caching only for LOAD_GLOBAL-like
>> opcodes?  Do you have an equivalent of LOAD_FAST, or you use
>> dicts to store local variables?
> The opcodes that have dict caching are:
>
> LOAD_NAME
> LOAD_GLOBAL
> LOAD_ATTR
> STORE_ATTR
> LOAD_METHOD (not implemented yet in mainline repo)
>
> For local variables we use LOAD_FAST and STORE_FAST (and DELETE_FAST).
> Actually, there are 16 dedicated opcodes for loading from positions
> 0-15, and 16 for storing to these positions.  Eg:
>
> LOAD_FAST_0
> LOAD_FAST_1
> ...
>
> Mostly this is done to save RAM, since LOAD_FAST_0 is 1 byte.

Interesting.  This might actually make CPython slightly faster
too.  Worth trying.

>
>> If we change the opcode size, it will probably affect libraries
>> that compose or modify code objects.  Modules like "dis" will
>> also need to be updated.  And that's probably just a tip of the
>> iceberg.
>>
>> We can still implement your approach if we add a separate
>> private 'unsigned char' array to each code object, so that
>> LOAD_GLOBAL can store the key offsets.  It should be a bit
>> faster than my current patch, since it has one less level
>> of indirection.  But this way we loose the ability to
>> optimize LOAD_METHOD, simply because it requires more memory
>> for its cache.  In any case, I'll experiment!
> Problem with that approach (having a separate array for offset_guess)
> is that how do you know where to look into that array for a given
> LOAD_GLOBAL opcode?  The second LOAD_GLOBAL in your bytecode should
> look into the second entry in the array, but how does it know?
>
>

I've changed my approach a little bit.  Now I have a simple
function [1] to initialize the cache for code objects that
are called frequently enough.

It walks through the code object's opcodes and creates the
appropriate  offset/cache tables.

Then, in ceval loop I have a couple of convenient macros
to work with the cache [2].  They use INSTR_OFFSET() macro
to locate the cache entry via the offset table initialized
by [1].

Thanks,
Yury

[1] https://github.com/1st1/cpython/blob/opcache4/Objects/codeobject.c#L167
[2] https://github.com/1st1/cpython/blob/opcache4/Python/ceval.c#L1164


More information about the Python-Dev mailing list