[issue30509] Optimize calling type slots

Tue May 30 10:00:33 EDT 2017

Serhiy Storchaka added the comment:

> type-slot-calls.diff: Can you please create a pull request?

I provided just a patch because I expected that you perhaps will want to play with it and propose alternative patch. It is simpler to compare patches with Rietveld than on GitHub. But if you prefer, I'll make a PR.

> Hum, can you please post a microbenchmark results to see the effect of the patch?

$ cat x.py
class A(object):
    def __add__(self, other):
        return 42

$ ./python -m perf timeit -s 'from x import A; a = A(); b = A()' --duplicate 100 'a.__add__(b)'
Unpatched:  Mean +- std dev: 256 ns +- 9 ns
Patched:    Mean +- std dev: 255 ns +- 10 ns

$ ./python -m perf timeit -s 'from x import A; a = A(); b = A()' --duplicate 100 'a + b'
Unpatched:  Mean +- std dev: 332 ns +- 10 ns
Patched:    Mean +- std dev: 286 ns +- 5 ns

> * the calling convention of the Python C API requires to create a tuple, and that's expensive

It also makes other optimizations, like avoiding using varargs and creating immediate method object. All this already is applied as side effects of your changes.

* "a + b" has a complex semantics which requires to check for __radd__, check for issubclass(), etc.

Since a and b have the same type the complex semantic doesn't play a role here.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30509>
_______________________________________