[pypy-dev] Pluggable HTM

Tue Dec 3 19:47:14 CET 2013

Hi Dimitri,

On Wed, Nov 27, 2013 at 9:17 AM, Dimitri Vorona <alendit at googlemail.com> wrote:
> the original STM proposal spoke of HTM as of a thing of a far future. Now,
> Haswells are out and provide built-in HTM support in form of TSX. In the
> near future I expect more and more systems to have it.
>
> Are there plan to make PyPy use HTM if it is available on the system?

I don't know yet.  I've just started playing with an Intel Haswell,
and getting slightly bad results in the form of too many random
transaction aborts.

This seems so for both "small" transactions that only access some 20KB
of data, up to larger transaction of almost 768KB, which is
impressively three times the size of the L2 cache; this seems to say
that even the L3 cache can dedicate a part of its resources to storing
the transaction cache lines.

But a naive extrapolation of the single-threaded results shows that,
if we had instead 8 threads running with the same results, even on
completely independent data, they would still abort too many
transactions each.  Whenever a transaction needs to be redone without
HTM, it really needs to stop all other threads.  So "too many" is in
this sense: even if it is only 10-20% on each core, it's enough to
prevent any scaling beyond just a coupe of cores.

It may be that I'm missing something, like a way to learn where
conflicts occur.  But all in all it is unclear if this is good enough
for PyPy (or CPython).  The next step, which I might do anyway, would
be to extract from the pypy-stm branch the general logic (most notably
the numerous conflict-avoiding small changes), and try to run that
with HTM.  This probably requires writing a different GC, but it
should be easy at this point to do, experimentally.

A bientôt,

Armin.