[IPython-dev] IPython parallel "education"

Moritz Beber moritz.beber at gmail.com
Mon Dec 22 15:50:04 EST 2014


Hi Jose,

Just wanted to share my experience with parallel:

On Mon, Dec 22, 2014 at 6:19 PM, Jose Gomez-Dans <jgomezdans at gmail.com>
wrote:

> Hi Aron,
>
> On 18 December 2014 at 20:22, Aron Ahmadia <aron at ahmadia.net> wrote:
>
>> What happens if instead of partitioning the data, you create a list of
>> work units and map those?
>> Something like:
>>
>> def apply_the_func(i):
>>       return the_func(X[N*i):X[(i+1)*N])
>>
>> Y = run_func.map ( [xrange(i), apply_the_func) for i in range(nodes)] )
>>
>
> This provides a substantial speed-up. I also tested other approaches
> (scatter&gather), but all in all, "pushing" X to the engines seems & using
> your suggestion seems to work. A question I have is what is going on behind
> the scenes when I push X around: do all the engines get a copy of the full
> X? In my case, X can be quite large, and it seems expensive to send lots
> and lots of data to engines that will only operate on a small fraction of
> the data...
>


I've been working with up to 2 GB of data and using the push mechanism is
not really feasible at that size. Also, the transmission time increases
linearly (super linearly?) with more target engines. So I've tried a few
solutions:

1.) If you're working on the same host and don't expect to expand that
switch to multiprocessing. It's very fast in transmitting data.
2.) Store your data on the file system and have each engine access that.
Either you have a shared file system for the remote kernels to access or
you'll need to copy the data beforehand/use paramiko.
3.) Having a database server is quite a bit of work to invest at the
beginning (especially if you don't know how) but really lends itself to
this sort of task. A database server usually has a connection pool so that
it can automatically handle many workers accessing it concurrently.


>
> Thanks for your help
> Jose
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
>
Just my thoughts/experience. Best of luck with your project,
Moritz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20141222/1380479d/attachment.html>


More information about the IPython-dev mailing list