[pypy-dev] trying out STM for some numbers on more cores
wlavrijsen at lbl.gov
wlavrijsen at lbl.gov
Wed Oct 30 01:22:16 CET 2013
Davide,
> I don't know. But I do know that processor/thread binding (if that is what
> you mean by "pin")
is what I meant. :) But a q&d implementation does not seem to make much
difference other than for 8 and 16 threads, where it helps a bit.
Running some more, I noticed that there are plenty of other overheads and the
'avg. time' doesn't get anywhere near stable until the number of iterations is
in the 1000s (I used 100 before).
iterations 16 threads 32 threads PyPy-2.1
100 127.43 146.57 9.63
200 77.59 86.37 7.80
500 46.92 49.12 6.82
1000 36.51 33.80 6.29
2000 32.18 28.69 6.40
The numbers are closer together, and HT now helps (note that the "slowdown"
for 2000 iterations for 2.1 is not significant; I should run this multiple
times and average, but this is just for fun). It is obvious, though, that
overheads are larger for STM atm, and are therefore important for longer.
The differences at larger number of iterations are much less for smaller
numbers of threads (and zero for 1 thread). Intuitively that makes sense. It
also says that 16 threads can give a 11x speedup if there's enough work to do.
Best regards,
Wim
--
WLavrijsen at lbl.gov -- +1 (510) 486 6411 -- www.lavrijsen.net
More information about the pypy-dev
mailing list