[IPython-dev] 0.11rc1 : problem with tutorial for PBS in http://ipython.org/ipython-doc/dev/parallel/parallel_process.html

Johann Cohen-Tanugi johann.cohentanugi at gmail.com
Mon Jul 4 16:01:33 EDT 2011


good evening.... still trying to make the PBS batch parallel code work. 
I had to comment the "-t" line in launcher.py, but I am still puzzled by 
the fact that there is no loop over n to start n different engines. Is 
that because the '-t' was precisely there to create an array of subjobs?

Second question, more general : assuming the use of ipcluster, a 
controller and several engines are created; following the tutorial, all 
would actually run in batch, which seems strange to me for the 
controller : batch queues usually have time limits, and it is 
unavoidable that engines would die when the cpu time is exceeded, but I 
do not see why the controller should suffer from this. What would be the 
rational to execute the controller in batch rather than locally? Second 
question, once the engines run in batch, I presume that they listen to 
commands sent from any ipython session that I would interactively start, 
providing I use the Client() with the correct permissions in terms of 
ports,ssh etc.... Is that correct, id est is that indeed the idea?

sorry to be dense about all that... I think it would be useful if the 
batch doc page was supplemented with the final step which amounts to 
starting an interactive ipython session and connecting to the batch engines.

will continue digging,
best.
Johann

On 07/04/2011 05:07 PM, Johann Cohen-Tanugi wrote:
> hi there, my problem is in the fact that a line seems to be added to the
> template I am defining following the tutorial :
> the template proposed in the tutorial is modified at runtime as :
>
> #!/bin/sh
> #PBS -t 1-4<----------------- incorrect?
> #PBS -V
> #PBS -N ipengine
> /usr/local/bin/python
> /sps/glast/users/cohen/IPYDEV/ipython/IPython/parallel/apps/ipengineapp.py
> profile_dir=/afs/in2p3.fr/home/t/tanugi/\
> .ipython/profile_pbs
>
>
> The problem I believe is in the job_array_template in  :
>
> class PBSLauncher(BatchSystemLauncher):
>       """A BatchSystemLauncher subclass for PBS."""
>
>       submit_command = List(['qsub'], config=True,
>           help="The PBS submit command ['qsub']")
>       delete_command = List(['qdel'], config=True,
>           help="The PBS delete command ['qsub']")
>       job_id_regexp = Unicode(r'\d+', config=True,
>           help="Regular expresion for identifying the job ID [r'\d+']")
>
>       batch_file = Unicode(u'')
>       job_array_regexp = Unicode('#PBS\W+-t\W+[\w\d\-\$]+')
>       job_array_template = Unicode('#PBS -t 1-{n}')
>       queue_regexp = Unicode('#PBS\W+-q\W+\$?\w+')
>       queue_template = Unicode('#PBS -q {queue}')
>
>
> I looked at the PBS doc for version 10 and 11 and I did not see any '-t'
> option. When I try to run, I get :
> [tanugi at ccali28 test_directory]$ ipcluster start profile=pbs n=4
> [IPClusterStart] Using existing profile dir:
> u'/afs/in2p3.fr/home/t/tanugi/.ipython/profile_pbs'
> [IPClusterStart] Starting ipcluster with [daemon=False]
> [IPClusterStart] Creating pid file:
> /afs/in2p3.fr/home/t/tanugi/.ipython/profile_pbs/pid/ipcluster.pid
> [IPClusterStart] Starting PBSControllerLauncher: ['qsub',
> u'/afs/in2p3.fr/home/t/tanugi/.ipython/profile_pbs/pbs_controller']
> [IPClusterStart] adding job array settings to batch script
> [IPClusterStart] Writing instantiated batch script:
> /afs/in2p3.fr/home/t/tanugi/.ipython/profile_pbs/pbs_controller
> unknown -t option
> ERROR:root:Error in periodic callback
> Traceback (most recent call last):
>     File
> "/sps/glast/users/cohen/IPYDEV/local/lib/python2.6/site-packages/zmq/eventloop/ioloop.py",
> line 432, in _run
>       self.callback()
>     File
> "/sps/glast/users/cohen/IPYDEV/ipython/IPython/parallel/apps/ipclusterapp.py",
> line 364, in start_controller
>       self.profile_dir.location
>     File
> "/sps/glast/users/cohen/IPYDEV/ipython/IPython/parallel/apps/launcher.py",
> line 943, in start
>       return super(PBSControllerLauncher, self).start(1, profile_dir)
>     File
> "/sps/glast/users/cohen/IPYDEV/ipython/IPython/parallel/apps/launcher.py",
> line 902, in start
>       job_id = self.parse_job_id(output)
>     File
> "/sps/glast/users/cohen/IPYDEV/ipython/IPython/parallel/apps/launcher.py",
> line 854, in parse_job_id
>       raise LauncherError("Job id couldn't be determined: %s" % output)
> LauncherError: Job id couldn't be determined:
>
> Not sure yet about the traceback, but the "unknown -t option" is clear.
> Furthermore, I wonder if it is really what we want to add lines to a
> template file provided by the user?
>
> best,
> Johann
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>



More information about the IPython-dev mailing list