[issue4753] Faster opcode dispatch on gcc

Sun Jan 4 03:28:10 CET 2009

Paolo 'Blaisorblade' Giarrusso <p.giarrusso at gmail.com> added the comment:

1st note: is that code from the threaded version? Note that you need to
modify the source to make it accept also ICC to try that.
In case you already did that, I guess the patch is not useful at all
with ICC since, as far as I can see, the jump is shared. It is vital to
this patch that the jump is not shared, something similar to
-fno-crossjumping should be found.

2nd note: the answer to your questions seems yes, ICC has less register
spills. Look for instance at:
       movl    -272(%ebp), %ecx
       movzbl  (%ecx), %eax
       addl    $1, %ecx

and
       movzbl    (%esi), %ecx
       incl      %esi

This represents the increment of the program counter after loading the
next opcode. In the code you posted, one can see that the program
counter is spilled to memory by GCC, but isn't by ICC. Either the spill
is elsewhere, or ICC is better here. And it's widely known that ICC has
a much better optimizer in many cases, and I remember that GCC register
allocator really needs improvement.

Finally, I'm a bit surprised by "addl $1, %ecx", since any peephole
optimizer should remove that; I'm not shocked just because I've never
seen perfect GCC output.

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4753>
_______________________________________