[SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster

Wes McKinney wesmckinn at gmail.com
Mon Nov 9 16:03:18 EST 2009


On Mon, Nov 9, 2009 at 3:56 PM, David Baddeley
<david_baddeley at yahoo.com.au> wrote:
> Hi Rohit,
>
> I've had a lot of sucess using PYRO (pyro.sourceforge.net) to distribute tasks across a cluster. Pyro's a remote objects implementation for python and makes inter-process communication really easy. The disadvantage of this approach is that you've got to write your own server to distribute the tasks, but this is almost trivial (mine's a class with getTask and postTask methods, and with the tasks stored internally in a list, and which is made remotely accessible using pyro). The advantage is that it seems to work well on any platform I've tried it on, and that it's really easy to add things like a timeout on tasks so that they can be reassigned if one of the workers falls over or is killed (I've had workers running as a windows screensaver). My tasks use a mixture of python and c, although no communication takes place in the c code.
>
> I took this route before I was aware of multiprocessing / the parallel components of ipython etc... and the communications overhead when using PYRO is relatively high so these other options would definitely be worth looking into.
>
> I can post the code for a minimal task server/client if you like.
>
> best wishes,
> David
>
> --- On Tue, 10/11/09, Rohit Garg <rpg.314 at gmail.com> wrote:
>
>> From: Rohit Garg <rpg.314 at gmail.com>
>> Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster
>> To: "SciPy Users List" <scipy-user at scipy.org>, numpy-discussions at scipy.org
>> Received: Tuesday, 10 November, 2009, 7:11 AM
>> Hi all,
>>
>> I have an embarrassingly parallel problem, very nicely
>> suited to
>> parallelization. I am looking for community feedback on how
>> to best
>> approach this matter? Basically, I just setup a bunch of
>> tasks, and
>> the various cpu's will pull data, process it, and send it
>> back. Out of
>> order arrival of results is no problem. The processing
>> times involved
>> are so large that the communication is effectively free,
>> and hence I
>> don't care how fast/slow the communication is. I thought
>> I'll ask in
>> case somebody has done this stuff before to avoid
>> reinventing the
>> wheel. Any other suggestions are welcome too.
>>
>> My only constraint is that it should be able to run a
>> python extension
>> (c++) with minimum of fuss. I want to minimize the
>> headaches involved
>> with setting up/writing the boilerplate code. Which
>> framework/approach/library would you recommend?
>>
>> There is one method mentioned at [1], and of course, one
>> could resort
>> to something like mpi4py.
>>
>> [1] http://docs.python.org/library/multiprocessing.html   {see
>> the last example}
>>
>> --
>> Rohit Garg
>>
>> http://rpg-314.blogspot.com/
>>
>> Senior Undergraduate
>> Department of Physics
>> Indian Institute of Technology
>> Bombay
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

Here's a little parallel processing library using Pyro which might be
of interest to some:

http://code.google.com/p/papyros/



More information about the SciPy-User mailing list