[issue4753] Faster opcode dispatch on gcc

Sat Jan 3 03:20:58 CET 2009

Paolo 'Blaisorblade' Giarrusso <p.giarrusso at gmail.com> added the comment:

> I'm not an expert in this kind of optimizations. Could we gain more
speed by making the dispatcher table more dense? Python has less than
128 opcodes (len(opcode.opmap) == 113) so they can be squeezed in a
smaller table. I naively assume a smaller table increases the amount of
cache hits.

Well, you have no binary compatibility constraint with a new release, so
it can be tried and benchmarked, or it can be done anyway!
On x86_64 the impact of the jump table is 8 bytes per pointer * 256
pointers = 2KiB, and the L1 data cache of Pentium4 can be 8KiB or 16KiB
wide.
But I don't expect this to be noticeable in most synthetic
microbenchmarks. Matrix multiplication would be the perfect one I guess;
the repeated column access would kill the L1 data cache, if the whole
matrixes don't fit.

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4753>
_______________________________________