Exploiting Dual Core's with Py_NewInterpreter's separated GIL ?

Tue Nov 7 07:03:03 EST 2006

"Martin v. Löwis" <martin at v.loewis.de> writes:
> Ah, but in the case where the lock# signal is used, it's known that
> the data is not in the cache of the CPU performing the lock operation;
> I believe it is also known that the data is not in the cache of any
> other CPU. So the CPU performing the LOCK INC sequence just has
> to perform two memory cycles. No cache coherency protocol runs
> in that case.

Paul Rubin wrote:
> How can any CPU know in advance that the data is not in the cache of
> some other CPU?

In the case where the LOCK# signal is asserted the area of memory
accessed is marked as being uncachable.  In a SMP system all CPUs must
have the same mapping of cached and uncached memory or things like this
break.  In the case where the LOCK# signal isn't used, the MESI
protocol informs the CPU of which of it's cache lines might also be in
the cache of another CPU.

> OK, this is logical, but it already implies a cache miss, which costs
> many dozen (100?) cycles.  But this case may be uncommon, since one
> hops that cache misses are relatively rare.

The cost of the cache miss is the same whether the increment
instruction is locked or not.

                                  Ross Ridge