[pypy-dev] trying out STM for some numbers on more cores

Wed Oct 30 01:22:16 CET 2013

Davide,

> I don't know. But I do know that processor/thread binding (if that is what 
> you mean by "pin")

is what I meant. :) But a q&d implementation does not seem to make much
difference other than for 8 and 16 threads, where it helps a bit.

Running some more, I noticed that there are plenty of other overheads and the
'avg. time' doesn't get anywhere near stable until the number of iterations is
in the 1000s (I used 100 before).

    iterations        16 threads        32 threads        PyPy-2.1
        100             127.43            146.57            9.63
        200              77.59             86.37            7.80
        500              46.92             49.12            6.82
       1000              36.51             33.80            6.29
       2000              32.18             28.69            6.40

The numbers are closer together, and HT now helps (note that the "slowdown"
for 2000 iterations for 2.1 is not significant; I should run this multiple
times and average, but this is just for fun). It is obvious, though, that
overheads are larger for STM atm, and are therefore important for longer.
The differences at larger number of iterations are much less for smaller
numbers of threads (and zero for 1 thread). Intuitively that makes sense. It
also says that 16 threads can give a 11x speedup if there's enough work to do.

Best regards,
            Wim
-- 
WLavrijsen at lbl.gov    --    +1 (510) 486 6411    --    www.lavrijsen.net