[Python-Dev] [ANN] VPython 0.1

Thu Oct 23 15:13:53 CEST 2008

A.M. Kuchling <amk <at> amk.ca> writes:
> 
>     threaded code: A technique for implementing virtual machine
>     interpreters, introduced by J.R. Bell in 1973, where each op-code in
>     the virtual machine instruction set is the address of some (lower
>     level) code to perform the required operation. This kind of virtual
>     machine can be implemented efficiently in machine code on most
>     processors by simply performing an indirect jump to the address which
>     is the next instruction.

Is this kind of optimization that useful on modern CPUs? It helps remove a
memory access to the switch/case lookup table, which should shave off the 3 CPU
cycles of latency of a modern L1 data cache, but it won't remove the branch
misprediction penalty of the indirect jump itself, which is more in the order of
10-20 CPU cycles depending on pipeline depth.

In 1973, CPUs were not pipelined and did not suffer any penalty for indirect
jumps, while lookups could be slow especially if they couldn't run in parallel
with other processing in the pipeline.

Thanks

Antoine.