[SciPy-user] Multiprocessing, GUIs and IPython

Gael Varoquaux gael.varoquaux at normalesup.org
Wed Jan 7 17:39:50 EST 2009


On Wed, Jan 07, 2009 at 12:00:03PM -0800, Brian Granger wrote:
> I see that people are starting to use multiprocessing to parallelize
> numerical Python code.  I am wondering if we want to allow/recommend
> using multiprocessing in scipy.

Too late! I use it in almost all code :). OK, none if this is in Scipy,
but multiprocessing is starting to creep in various places.

> * Currently multiprocessing doesn't play well with IPython.  Thus, if
> scipy starts to use multiprocessing, people will get very unpleasant
> surprises when using IPython.  I don't know exactly what the problems
> are, but my feeling is that it is unlikely that IPython will ever have
> *full* support for multiprocessing.  Some support might be possible,
> though.

As Robert points out, that's because of wizardry done by IPython. That's
really a pity, because in my experience, multiprocessing is fairly
robust. Nothing that's not fixable from IPython's side, though, I
believe.

> * Multiprocessing doesn't play well with other things as well, such as
> Twisted.  Again, if scipy uses multiprocessing, it would become
> usuable within Twisted based servers.

IMHO that's a bug of Twisted :). More seriously, multiprocessing is now
in the standard library. It may have some quirks, but I think everybody
should try and play well with it, and I wouldn't be surprised to see
things improving as people get familiar with it.

> What experience have others had with using multiprocessing in these
> contexts.  Success?  Failure?

I have tried every solution for parallel computing, and for
single-machine parallel computing, multiprocessing is my favorite option.
The reason being that its API for spawning and killing processes is
really light and quick (fork gives you speed). It does not eat much
resources, and it allows sharing of arrays or other types. It implements
a very light parallel computing which is very much what I need. Moreover,
the fork give automatic distribution of globals which I like a lot. On
the other hand, error-management is less than ideal.

I must admit I would really like to see IPython using multiprocessing as
a backend for single-computer parallel computing (I have 8 cores, so I do
a lot of that). I don't know if it is compatible with IPython's
architecture. Specifically, I would like to be able to use the same API
than IPython, with a fork-based mechanism. I would also like the easy
process management.

> Based on that, what to other people recommend and think about using
> multiprocessing in scipy or numpy?  I guess this also applies to any
> other project in this realm (sympy, pymc, ETS, matplotlib, etc., etc.).

I think they are several solutions for parallel computing with Python.
Right now they all have pros and cons. We need to strive to support as
many as possible. Multiprocessing is especially important since it comes
with the standard library.

My 2 cents,

Gaël



More information about the SciPy-User mailing list