[issue4753] Faster opcode dispatch on gcc

Sun Jan 4 05:51:14 CET 2009

Alexandre Vassalotti <alexandre at peadrop.com> added the comment:

Paolo wrote:
> So, can you try dropping the switch altogether, using always computed
> goto and seeing how does the resulting code get compiled?

Removing the switch won't be possible unless we change the semantic
EXTENDED_ARG. In addition, I doubt the improvement, if any, would worth
the increased complexity.

> To be absolutely clear: x86_64 has more registers, so the rest of the
> interpreter is faster than x86, but dispatch still takes the same
> absolute time, which is 70% on x86_64, but only 50% on x86 (those are
> realistic figures);

I don't understand what you mean by "absolute time" here. Do you
actually mean the time spent interpreting bytecodes compared to the time
spent in the other parts of Python? If so, your figures are wrong for
CPython on x86-64. It is about 50% just like on x86 (when running
pybench). With the patch, this drops to 35% on x86-64 and to 45% on x86.

> In my toy interpreter, computing last_i for each dispatch doesn't give
> any big slowdown, but storing it in f->last_i gives a ~20% slowdown.

I patched ceval.c to minimize f->last_i manipulations in the dispatch
code.  On x86, I got an extra 9% speed up on pybench. However, the patch
is a bit clumsy and a few unit tests are failing. I will see if I can
improve it and open a new issue if worthwhile.

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4753>
_______________________________________