[IPython-dev] IPython1 parallel questions

Fri Dec 15 19:43:30 EST 2006

Brian,

Thank you for the response.

I've been moving forward with the integration of my library with
IPython. Unfortunately, I've come across an issue that has prevented
our library from functioning correctly.

First, let me give a little background. Right now, our library is
loaded in each engine by importing it as a python module. There are
cases in our library when a function will be executed on each node,
but one node may take slightly longer to process. For instance, when
reading input node 0 is responsible for scanning the file and
distributing data to each process. Code of this fashion takes
advantage of MPI_Win_lock and MPI_Win_unlock when sending data to
other processes.

The issue I've come across is that at times these calls to MPI code
will deadlock in certain instances. What I think is going on is that
the functions in our library that do relatively little will return,
causing the IPEngine to continue to its input handler code (in effect
blocking on the engine's socket). If, after this has happened, another
node tries to do MPI_Win_lock or unlock the call will deadlock because
MPI is unable to communicate with the other nodes that are now blocked
in IPEngine code.

At least, this is my educated guess as to what is happening.

Is this an issue that you've had to work out before?

Is there some way to ensure that each engine will not block on its
socket until every engine has had a chance to execute the user code
completely?

I guess what I'm thinking of is similar to an MPI_Barrier such that
each engine would

execute(user_code)
Barrier()
#once each engine is at this point, the engines can return to their internal
#communication code

I greatly appreciate your feedback.

Thank you,
~doug

On 12/14/06, Brian Granger <ellisonbg.net at gmail.com> wrote:
> Douglas,
>
> See my comments inline below.
>
> > I recently discovered IPython1 and its parallel capabilities. Once I
> > read about the feature set, I became extremely excited as IPython1
> > seems to solve many of the interactive parallel computing problems
> > that I had hoped to solve for a project that I am a developer on.
> >
> > Let me start by framing my project. Our code currently exists as
> > library written in C++. Our library does parallel computation using
> > MPI. We currently have code that exposes this to python in an
> > interactive manner, and have been looking for ways to expand this to a
> > fuller more robust implementation. I'm hoping that IPython1 will be
> > the perfect piece to create our collaborative, interactive
> > environment.
>
> Nice, this is one of the main usage cases we had in mind when
> designing ipython1.  It should work well for this.
>
> > That said, as I investigate IPython further and start to develop some
> > prototype code, I have some questions for the community and
> > developers.
> >
> > 1) Are there any outstanding issues or problems with the IPython1 code
> > base that inhibits parallel computation? I've been experimenting with
> > the code and its features for the past few days in addition to looking
> > through the bug list, and from what I can tell there are none. Are
> > there any features mentioned in IPython's documentation or the
> > presentations on the site that haven't been implemented yet?
>
> Not really.  IPython1 is already being used for real science and
> people don't seem to be limited in any significant way.  With that
> said, there are a number of things we are still working on.  Here is a
> taste:
>
> 1.  Optimizing how large objects are send between the client and
> engines.  In our current approach, the controller become a bottleneck
> when you try to use push/pull to send really big things (> 100's of
> MBs).  With that said, if you are wanting to send large objects
> around, you might want to rethink how you are parallelizing you
> algorithm.
>
> 2.  Task farming.  While the architecture is setup for task farming,
> we have not implemented a few parts of it.  We are actively (as in
> this week) working on this.
>
> 3.  Security.
>
> 4.  Scalability.  Currently all the engines connect to a single
> controller.  We have tested this on 128 processors and it works fine.
> The problem is that most systems have a per process file-descriptor
> limit that is not much higher than that (like 256).  We would like to
> explore ways (multiple controllers?) of getting it to scale beyond
> 128-256.
>
> > 2) Is there any authentication or security between the client shell
> > and the IPython controller?
>
> Not yet, but we are working on it.
>
> > 3) What are the proper ways to configure the operation of the client
> > and the controller or the engines? From what I've seen it looks like
> > there is an API that allows the user to set certain options. Can
> > configuration files be used?
>
> Yes, we have a very powerful configuration system.  For examples see
> the ipython1/configfiles directory.  You can simply copy these over to
> your ~/.ipython directory and start playing around.
> The documentation on this part of things is still not great though.
> Please let us know if you have questions.
>
> > Some configuration issues I'm thinking of right now involve output.
> > For instance, is there a way to turn off the feedback from each node
> > for every line of code? I would imagine that if you had a cluster with
> > many nodes running, you would not want to see this feedback. Another
> > issue is how output on stdout from C++ code is handled. I noticed that
> > right now, any output on stdout for a simple C++ module that I have
> > exposed to python and loaded in each engine simply is simply written
> > by the engine to the console. Is there some way to redirect this
> > output?
>
> There are two ways of executing code: blocking and non-blocking:
>
> rc.executeAll('a=10', block=True)
>
> This mode will wait until the command has been run and it will bring
> back the stdout/stderr and print it to the screen.  If the remote
> command is long running, your local ipython session will remain
> blocked until the command is complete.
>
> rc.executeAll('time.sleep(1000000)', block=False)
>
> In non-blocking mode, execute returns immediately after _submitting_
> the command.  Furthermore, it won't automatically bring back and print
> the stdout/stderr of the command.
>
> You can also set all command to block or not by using the block
> attribute of the RemoteController object:
>
> rc.block = False
>
> I usually use block=True for debugging, but then set block=False for
> long running things.  Also, you can always get the stdout/stderr of a
> previously run command by using the %result magic:
>
> %result                             # print the stdout/stderr of the
> last remote commad
>
> %result 10                       # print the stdout/stderr of the 10th
> remote command
>
> > 4) When objects are sent from the kernel to the client, is the only
> > prerequisite to this that the object being sent is able to be pickled?
> > I suppose the client would also have to have the class code for any
> > objects imported?
>
> Yes.  Numpy arrays are sent using their raw buffers so for that case
> you don't have to pay the price of pickling.
>
> > 5) Are there any known projects that use IPython1 for interactive
> > scientific computing? I'd be really interested to see ones that also
> > support visualization of distributed data.
>
> Yes.  Two examples:
>
> 1.  At my company (Tech-X), there are a group of people using ipython1
> to do parallel data analysis on supercomputers.  They start with
> 50-100 GBs of data in 1000-2000 hdf5 data files and need to do a bunch
> of calculations that involve data from multiple files.  They use
> ipython1 to first reduce the data they want (in parallel) to a single
> file and then then run an algorithm (again in parallel) over a set of
> parameters.  There were able to parallelize this code in 2 days and
> its shows nice linear scaling.
>
> 2.  Fernando Perez (the creator of ipython) is using ipython1 in his
> research in applied mathematics.  His algorithm uses multiresolution
> analysis to solve high dimensional partial differential equations.  He
> has just begun (in the last 2 weeks) to parallelize his code using
> ipython1, so I don't think he is in production mode.  His case is also
> non-trivial as it 1) needs automatic load balancing and 2) has lots of
> communications.
>
> The reason we started working on this, is that both Fernando and I are
> scientists (both got our Ph.D.'s in Theoretical Physics) and we wanted
> these tools to exist so we could use them for our own research.
>
> There are some other folks on the list that have playing around with
> ipython1, but I am not sure if anyone else has moved into production
> mode yet.
>
> > Well, I think that covers all my initial questions. Thank you for
> > bearing this long post and my newbie questions. I'm really looking
> > forward to working more with IPython as it seems like an amazing piece
> > of software.
>
> Thanks!  Please let us know if you have more questions/comments or ideas.
>
> Brian
>
> > Thank you,
> > ~doug
> > _______________________________________________
> > IPython-dev mailing list
> > IPython-dev at scipy.org
> > http://projects.scipy.org/mailman/listinfo/ipython-dev
> >
>