client-server parallellised number crunching

geremy condra debatem1 at gmail.com
Wed Apr 27 02:54:24 EDT 2011


On Tue, Apr 26, 2011 at 10:58 PM, Hans Georg Schaathun
<georg at schaathun.net> wrote:
> On Tue, 26 Apr 2011 14:31:59 -0700, geremy condra
>  <debatem1 at gmail.com> wrote:
> :  Without knowledge of what you're doing it's hard to comment
> :  intelligently,
>
> I need to calculate map( foobar, L ) where foobar() is a pure function
> with no dependency on the global state, L is a list of tuples, each
> containing two numpy arrays, currently 500-1000 floats each + a scalar
> or two.  The result is a pair of floats.
>
> The foobar() function is sufficiently heavy to merit demonstratably
> parallellisation.

This sounds like a hadoop job, with the caveat that you still have to
get your objects across the network somehow. Have you tried xdrlib or
the struct module? I suspect either would save you some time.

> The CPU-s I have available to spread the load further are not clustered.
> They are prone to crash without warning and I do not have root access.
> I don't have exclusive use.  I do not even have physical access, so I
> cannot use a liveCD.  (They are, however, equipped with a batch queue
> system (torque).)

Hmm. I guess I'd boil it down to this: if you have the ability to
install software on them, give hadoop a try. If, OTOH you can't
disturb normal lab operation at all and need a lot of CPU power, you
should probably start weighing what your time is worth against the
cost of firing up a few EC2 instances and being done with it- I use it
for cryptanalytic work with a similar structure all the time, and you
can get a hell of a cluster going over there for about $15-20 an hour.
If you're an academic (it sounds like you are) you may also be able to
use things like planetlab and emulab, which are free and reasonably
easy to use.

> :                 but I'd try something like CHAOS or OpenSSI to see if
> :  you can't get what you need for free, if that doesn't do it then try
> :  dropping a liveCD with Hadoop on it in each machine and running it
> :  that way.  If that can't work, try MPI. If you've gotten that far and
> :  nothing does the trick then you're probably going to have to give
> more
> :  details.
>
> TANSTAFL :-)
> There is always the learning curve
>
> If I understand it correctly, openSSI requires root access; is that
> right?  For CHAOS I need more details to be able to google; I found
> a fractals toolbox, but that did not seem relevant :-)

OpenSSI and CHAOS are both Single System Image clustering solutions-
they're pretty cool, but you pretty much need to be able to run a live
CD to make it worth your time.

> MPI I have tried before.  Unless there is a new, massively more
> sophisticated MPI library around now, I would certainly have to
> do my own code to cope with lost clients.

Sandia labs has some neat work in this area, but if hadoop fits your
computational model it will be much easier on you in terms of
implementation.

> Hadoop sounds intresting.  I had encountered it before, but did not
> think about it.  However, the liveCD is clearly not an option.  Thanks
> for the tip; I'll read up on map-reduce at least.

Np, hope it solves things for you ;)

Geremy Condra



More information about the Python-list mailing list