Exploiting Dual Core's with Py_NewInterpreter's separated GIL ?

Sat Nov 4 09:41:55 EST 2006

robert schrieb:
> PS: Besides: what are speed costs of LOCK INC <mem> ?

That very much depends on the implementation. In

http://gcc.gnu.org/ml/java/2001-03/msg00132.html

Hans Boehm claims it's 15 cycles. The LOCK prefix
itself asserts the lock# bus signal for the entire
operation, meaning that the other processors
can't perform memory operations during that time.
On the P6, if the data is cacheable (for some Intel
definition of this word), the lock# signal will
not be asserted, just the cache gets locked.
The LOCK prefix also causes any pending writes
to be performed before the operation starts.

So in the worst case, a LOCK INC will have to
wait for pending writes, then will assert the
lock# prefix, then perform a read and a write
cycle memory cycle, with the increment processor
cycle in-between. Assuming a 400MHz memory bus
and a 4GHz processor, LOCK INC will take around
20 cycles, whereas a plain INC might get done
in a single cycle or less (assuming pipelining
and caching).

Regards,
Martin