[IPython-dev] ipcluster (LSF) timing (check if all engines are running)
MinRK
benjaminrk at gmail.com
Tue Aug 20 19:44:29 EDT 2013
That's a bug that sockets aren't properly cleaned up if the Client never
finishes getting connected. Should be fixed by [PR #4074](
https://github.com/ipython/ipython/pull/4074).
On Tue, Aug 20, 2013 at 3:20 PM, Florian M. Wagner <wagnerfl at student.ethz.ch
> wrote:
> Hey MIN,
>
> thanks for the example. The first while statement waits for the json file
> as expected, but when I start the cluster and it finds it, a zeromq error
> occurs: Too many open files (signaler.cpp:330)
> Do you have an idea?
>
> Am 19.08.2013 16:28, schrieb MinRK:
>
> Something like this should work:
>
> from IPython import parallel
>
> def wait_for_cluster(engines=1, **kwargs):
> """Wait for an IPython cluster to startup and register a minimum
> number of engines"""
> # wait for the controller to come up
> while True:
> try:
> client = parallel.Client(**kwargs)
> except IOError:
> print "No ipcontroller-client.json, waiting..."
> time.sleep(10)
> except TimeoutError:
> print "No controller, waiting..."
> time.sleep(10)
> if not engines:
> return
> # wait for engines to register
> print "waiting for %i engines" % engines,
> running = len(client)
> sys.stdout.write('.' * running)
> while running < engines:
> time.sleep(1)
> previous = running
> running = len(client)
> sys.stdout.write('.' * (running - previous))
>
>
>
> On Mon, Aug 19, 2013 at 6:34 AM, Florian M. Wagner <
> wagnerfl at student.ethz.ch> wrote:
>
>> Hey all,
>>
>> I am using IPython.parallel on a large cluster, where controller and
>> engines are launched via LSF. My current workflow is as follows:
>>
>> #!/bin/bash
>> python pre_processing.py
>> ipcluster start --profile=cluster --n=128 > ipcluster.log 2>&1
>> sleep 120
>> python main_computation.py
>> python post_processing.py
>>
>>
>> I am not entirely happy with this, since the 2 minutes are not always
>> enough depending on the load of the cluster. I believe that there is a much
>> more elegant way to launch the cluster and check if all the eninges are
>> running, before proceeding with the main computation. I would highly
>> appreciate any help.
>>
>> Best regards
>> Florian
>>
>>
>> _______________________________________________
>> IPython-dev mailing list
>> IPython-dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>
>>
>
>
> _______________________________________________
> IPython-dev mailing listIPython-dev at scipy.orghttp://mail.scipy.org/mailman/listinfo/ipython-dev
>
>
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20130821/25c3d13a/attachment.html>
More information about the IPython-dev
mailing list