[IPython-dev] ipcluster (LSF) timing (check if all engines are running)

MinRK benjaminrk at gmail.com
Tue Aug 20 19:44:29 EDT 2013


That's a bug that sockets aren't properly cleaned up if the Client never
finishes getting connected.  Should be fixed by [PR #4074](
https://github.com/ipython/ipython/pull/4074).


On Tue, Aug 20, 2013 at 3:20 PM, Florian M. Wagner <wagnerfl at student.ethz.ch
> wrote:

>  Hey MIN,
>
> thanks for the example. The first while statement waits for the json file
> as expected, but when I start the cluster and it finds it, a zeromq error
> occurs: Too many open files (signaler.cpp:330)
> Do you have an idea?
>
> Am 19.08.2013 16:28, schrieb MinRK:
>
> Something like this should work:
>
>  from IPython import parallel
>
>  def wait_for_cluster(engines=1, **kwargs):
>     """Wait for an IPython cluster to startup and register a minimum
> number of engines"""
>     # wait for the controller to come up
>     while True:
>         try:
>             client = parallel.Client(**kwargs)
>         except IOError:
>             print "No ipcontroller-client.json, waiting..."
>             time.sleep(10)
>         except TimeoutError:
>             print "No controller, waiting..."
>             time.sleep(10)
>     if not engines:
>         return
>     # wait for engines to register
>     print "waiting for %i engines" % engines,
>     running = len(client)
>     sys.stdout.write('.' * running)
>     while running < engines:
>         time.sleep(1)
>         previous = running
>         running = len(client)
>         sys.stdout.write('.' * (running - previous))
>
>
>
> On Mon, Aug 19, 2013 at 6:34 AM, Florian M. Wagner <
> wagnerfl at student.ethz.ch> wrote:
>
>>  Hey all,
>>
>> I am using IPython.parallel on a large cluster, where controller and
>> engines are launched via LSF. My current workflow is as follows:
>>
>> #!/bin/bash
>> python pre_processing.py
>> ipcluster start --profile=cluster --n=128 > ipcluster.log 2>&1
>> sleep 120
>> python main_computation.py
>> python post_processing.py
>>
>>
>> I am not entirely happy with this, since the 2 minutes are not always
>> enough depending on the load of the cluster. I believe that there is a much
>> more elegant way to launch the cluster and check if all the eninges are
>> running, before proceeding with the main computation. I would highly
>> appreciate any help.
>>
>> Best regards
>> Florian
>>
>>
>> _______________________________________________
>> IPython-dev mailing list
>> IPython-dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>
>>
>
>
> _______________________________________________
> IPython-dev mailing listIPython-dev at scipy.orghttp://mail.scipy.org/mailman/listinfo/ipython-dev
>
>
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20130821/25c3d13a/attachment.html>


More information about the IPython-dev mailing list