[pypy-dev] Helping with STM at the PyCon 2013 (Santa Clara) sprints

Mon Feb 18 14:27:11 CET 2013

I got frustrated with my (actually dying now) local box and signed up
for AWS. Using an m1.medium instance to build pypy (~100 minutes), and
then upgrading it to a c1.xlarge (claims to be 8 virtual cores of 2.5
ECU each).

With the same sample program, I see the expected kinds of speedups! :D
So using VMWare is right out.

Hopefully that info is useful for someone else in the future. :)

On Sun, Feb 17, 2013 at 6:38 PM, Taavi Burns <taavi.burns at gmail.com> wrote:
> That's great, thanks! I did get it to work when you wrote earlier, but
> it's definitely faster now.
>
> I tried a ridiculously simple and no-conflict parallel program and
> came up with this, which gave me some questionable performance numbers
> from a build of 65ec96e15463:
>
> taavi at pypy:~/pypy/pypy/goal$ ./pypy-c -m timeit -s 'import
> transaction; transaction.set_num_threads(1)' '
> def foo():
>    x = 0
>    for y in range(100000):
>        x += y
> transaction.add(foo)
> transaction.add(foo)
> transaction.run()'
> 10 loops, best of 3: 198 msec per loop
>
> taavi at pypy:~/pypy/pypy/goal$ ./pypy-c -m timeit -s 'import
> transaction; transaction.set_num_threads(2)' '
> def foo():
>    x = 0
>    for y in range(100000):
>        x += y
> transaction.add(foo)
> transaction.add(foo)
> transaction.run()'
> 10 loops, best of 3: 415 msec per loop
>
>
> It's entirely possible that this is an effect of running inside a
> VMWare guest (set to use 2 cores) running on my Core2Duo laptop. If
> this is the case, I'll refrain from trying to do anything remotely
> like benchmarking in this environment in the future. :)
>
> Would it be more helpful (if I want to contribute to STM) to use
> something like a high-CPU EC2 instance, or should I look at obtaining
> something like an 8-real-core AMD X8?
>
> (my venerable X2 has started to disagree with its RAM, so it's prime
> for retirement)
>
> Thanks!
>
> On Sun, Feb 17, 2013 at 3:58 AM, Armin Rigo <arigo at tunes.org> wrote:
>> Hi Taavi,
>>
>> I finally fixed pypy-stm with signals.  Now I'm getting again results
>> that scale with the number of processors.
>>
>> Note that it stops scaling up at some point, around 4 or 6 threads, on
>> machines I tried it on.  I suspect it's related to the fact that
>> physical processors have 4 or 6 cores internally, but the results are
>> still a bit inconsistent.  Using the "taskset" command to force the
>> threads to run on particular physical sockets seems to help a little
>> bit with some numbers.  Fwiw, I got the maximum throughput on a
>> 24-cores machine by really running 24 threads, but that seems
>> wasteful, as it is only 25% better than running 6 threads on one
>> physical socket.
>>
>> The next step will be trying to reduce the overhead, currently
>> considerable (about 10x slower than CPython, too much to ever have any
>> net benefit).  Also high on the list is fixing the constant memory
>> leak (i.e. implementing major garbage collection steps).
>>
>>
>> A bientôt,
>>
>> Armin.
>
>
>
> --
> taa
> /*eof*/

--
taa
/*eof*/