[Python-Dev] Speeding up CPython 5-10%
Yury Selivanov
yselivanov.ml at gmail.com
Fri Jan 29 10:06:38 EST 2016
Hi Damien,
BTW I just saw (and backed!) your new Kickstarter campaign
to port MicroPython to ESP8266, good stuff!
On 2016-01-29 7:38 AM, Damien George wrote:
> Hi Yury,
>
> [..]
>> Do you use opcode dictionary caching only for LOAD_GLOBAL-like
>> opcodes? Do you have an equivalent of LOAD_FAST, or you use
>> dicts to store local variables?
> The opcodes that have dict caching are:
>
> LOAD_NAME
> LOAD_GLOBAL
> LOAD_ATTR
> STORE_ATTR
> LOAD_METHOD (not implemented yet in mainline repo)
>
> For local variables we use LOAD_FAST and STORE_FAST (and DELETE_FAST).
> Actually, there are 16 dedicated opcodes for loading from positions
> 0-15, and 16 for storing to these positions. Eg:
>
> LOAD_FAST_0
> LOAD_FAST_1
> ...
>
> Mostly this is done to save RAM, since LOAD_FAST_0 is 1 byte.
Interesting. This might actually make CPython slightly faster
too. Worth trying.
>
>> If we change the opcode size, it will probably affect libraries
>> that compose or modify code objects. Modules like "dis" will
>> also need to be updated. And that's probably just a tip of the
>> iceberg.
>>
>> We can still implement your approach if we add a separate
>> private 'unsigned char' array to each code object, so that
>> LOAD_GLOBAL can store the key offsets. It should be a bit
>> faster than my current patch, since it has one less level
>> of indirection. But this way we loose the ability to
>> optimize LOAD_METHOD, simply because it requires more memory
>> for its cache. In any case, I'll experiment!
> Problem with that approach (having a separate array for offset_guess)
> is that how do you know where to look into that array for a given
> LOAD_GLOBAL opcode? The second LOAD_GLOBAL in your bytecode should
> look into the second entry in the array, but how does it know?
>
>
I've changed my approach a little bit. Now I have a simple
function [1] to initialize the cache for code objects that
are called frequently enough.
It walks through the code object's opcodes and creates the
appropriate offset/cache tables.
Then, in ceval loop I have a couple of convenient macros
to work with the cache [2]. They use INSTR_OFFSET() macro
to locate the cache entry via the offset table initialized
by [1].
Thanks,
Yury
[1] https://github.com/1st1/cpython/blob/opcache4/Objects/codeobject.c#L167
[2] https://github.com/1st1/cpython/blob/opcache4/Python/ceval.c#L1164
More information about the Python-Dev
mailing list