[New-bugs-announce] [issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

Wed Oct 21 23:21:38 EDT 2020

New submission from Pablo Galindo Salgado <pablogsal at gmail.com>:

After https://bugs.python.org/issue42093 and https://bugs.python.org/issue26219 is being clear that we can leverage some cache for different information in the evaluation loop to speed up CPython. This observation is also based on the fact that although Python is dynamic, there is plenty of code that does not exercise said dynamism and therefore factoring out the "dynamic" parts of the execution by using a cache mechanism can yield excellent results. 

So far we have two big improvements in performance for caching LOAD_ATTR and LOAD_GLOBAL (in some cases up to 10-14%) but I think we can do much much more. Here are some observations of what I think we can do:

* Instead of adding more caches using the current mechanism, which adds some inlined code in every opcode in the evaluation loop, we can try to formalize some kind of caching mechanism that has some better API that will allow adding more opcodes in the future. Having the code inline in ceval.c is going to become difficult to maintain if we keep adding more stuff directly there.

* Instead of handling the specialization in the same opcode as the original one (LOAD_ATTR is doing the slow and the fast path) we could mutate the original code object and replacing the slow and generic opcodes for the more specialized ones and these will also be in charge of changing it back to the generic and slow ones if the assumptions that activated them appear.

Obviously, mutating code objects is scary, so we could have some "specialized" version of the bytecode in the cache and use that if is present. Ideas that we could do with this cached stuff:

- For binary operators, we can grab both operands, resolve the addition function and cache that together with the types and the version tags and if the types and the version tags are the same, use directly the addition function instead of resolving it.

- For loading methods, we could cache the bound method as proposed by Yury originally here: https://mail.python.org/pipermail/python-dev/2016-January/142945.html.

- We could also do the same for operations like "some_container[]" if the container is some builtin. We can substitute/specialize the opcode for someone that directly uses built-in operations instead of the generic BINARY_SUBSCR.

The plan will be:

- Making some infrastructure/framework for the caching that allows us to optimize/deoptimize individual opcodes.
- Refactor the existing specialization for LOAD_GLOBAL/LOAD_ATTR to use said infrastructure.
- Thinking of what operations could benefit from specialization and start adding them one by one.

----------
components: C API
messages: 379272
nosy: Mark.Shannon, methane, nascheme, pablogsal, vstinner, yselivanov
priority: normal
severity: normal
status: open
title: Caching infrastructure for the evaluation loop: specialised opcodes
versions: Python 3.10

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue42115>
_______________________________________