[IPython-dev] SciPy Sprint summary

Justin Riley justin.t.riley at gmail.com
Fri Jul 23 17:54:50 EDT 2010


Hi Satrajit/Matthieu,

Satrajit, so for now I set /bin/sh to be the shell for all generated
scripts (PBS/SGE/LSF) given that it's probably the most commonly
included shell on *NIXs. Should we still add a --shell option? If the
user passes their own script they can of course customize the shell,
but otherwise I would imagine /bin/sh with the generated code should
work for most folks. If it still makes sense to have a --shell option
I'll add it in.

Matthieu, I updated my 0.10.1-sge branch to address the LSF shell
redirection issue. Basically I create a bsub wrapper that does the
shell redirection and then pass the wrapper to getProcessOutput. I
don't believe Twisted's getProcessOutput will handle stdin redirection
so this is my solution for now. Would you mind testing this new code
with LSF?

I've also updated the parallel_process.txt docs for ipcluster. Let me
know what you guys think.

~Justin

On Fri, Jul 23, 2010 at 3:19 PM, Satrajit Ghosh <satra at mit.edu> wrote:
> if i add the following line to sge script to match my shell, it works fine.
> perhaps we should allow adding shell as an option like queue and by default
> set it to the user's shell?
>
> #$ -S /bin/bash
>
> cheers,
>
> satra
>
>
>
> On Wed, Jul 21, 2010 at 11:58 PM, Satrajit Ghosh <satra at mit.edu> wrote:
>>
>> hi justin,
>>
>>> 1. By cleanly installed, do you mean SGE in addition to
>>> ipython/ipcluster?
>>
>> no just the python environment.
>>
>>>
>>> 2. From the job output you sent me previously (when it wasn't working) it
>>> seems that there might have been a mismatch in the shell that was used given
>>> that the output was complaining about "Illegal variable name". I've noticed
>>> that SGE likes to assign csh by default on my system if I don't specify a
>>> shell at install time.  What is the output of "qconf -sq all.q | grep -i
>>> shell" for you?
>>
>> (nipype0.3)satra at sub:/tmp$ qconf -sq all.q | grep -i shell
>> shell                 /bin/sh
>> shell_start_mode      unix_behavior
>>
>>  (nipype0.3)satra at sub:/tmp$ qconf -sq sub | grep -i shell
>> shell                 /bin/csh
>> shell_start_mode      posix_compliant
>>
>> (nipype0.3)satra at sub:/tmp$ qconf -sq twocore | grep -i shell
>> shell                 /bin/bash
>> shell_start_mode      posix_compliant
>>
>> only twocore worked. all.q and sub didn't. choosing the latter two puts
>> the job in qw state.
>>
>> my default shell is bash.
>>
>> cheers,
>>
>> satra
>>
>>>
>>> Thanks!
>>> ~Justin
>>> On Wed, Jul 21, 2010 at 9:05 PM, Satrajit Ghosh <satra at mit.edu> wrote:
>>>>
>>>> hi justin.
>>>>
>>>> i really don't know what the difference is, but i clean installed
>>>> everything and it works beautifully on SGE.
>>>>
>>>> cheers,
>>>>
>>>> satra
>>>>
>>>>
>>>> On Tue, Jul 20, 2010 at 4:04 PM, Brian Granger <ellisonbg at gmail.com>
>>>> wrote:
>>>>>
>>>>> Great!  I mean great that you and Justin are testing and debugging
>>>>> this.
>>>>>
>>>>> Brian
>>>>>
>>>>> On Tue, Jul 20, 2010 at 1:01 PM, Satrajit Ghosh <satra at mit.edu> wrote:
>>>>> > hi brian,
>>>>> >
>>>>> > i ran into a problem (my engines were not starting) and justin and i
>>>>> > are
>>>>> > going to try and figure out what's causing it.
>>>>> >
>>>>> > cheers,
>>>>> >
>>>>> > satra
>>>>> >
>>>>> >
>>>>> > On Tue, Jul 20, 2010 at 3:19 PM, Brian Granger <ellisonbg at gmail.com>
>>>>> > wrote:
>>>>> >>
>>>>> >> Satra,
>>>>> >>
>>>>> >> If you could test this as well, that would be great.  Thanks.
>>>>> >>  Justin,
>>>>> >> let us know when you think it is ready to go with the documentation
>>>>> >> and testing.
>>>>> >>
>>>>> >> Cheers,
>>>>> >>
>>>>> >> Brian
>>>>> >>
>>>>> >> On Tue, Jul 20, 2010 at 7:48 AM, Justin Riley
>>>>> >> <justin.t.riley at gmail.com>
>>>>> >> wrote:
>>>>> >> > On 07/19/2010 01:06 AM, Brian Granger wrote:
>>>>> >> >> * I like the design of the BatchEngineSet.  This will be easy to
>>>>> >> >> port
>>>>> >> >> to
>>>>> >> >>   0.11.
>>>>> >> > Excellent :D
>>>>> >> >
>>>>> >> >> * I think if we are going to have default submission templates,
>>>>> >> >> we need
>>>>> >> >> to
>>>>> >> >>   expose the queue name to the command line.  This shouldn't be
>>>>> >> >> too
>>>>> >> >> tough.
>>>>> >> >
>>>>> >> > Added --queue option to my 0.10.1-sge branch and tested this with
>>>>> >> > SGE
>>>>> >> > 62u3 and Torque 2.4.6. I don't have LSF to test but I added in the
>>>>> >> > code
>>>>> >> > that *should* work with LSF.
>>>>> >> >
>>>>> >> >> * Have you tested this with Python 2.6.  I saw that you mentioned
>>>>> >> >> that
>>>>> >> >>   the engines were shutting down cleanly now.  What did you do to
>>>>> >> >> fix
>>>>> >> >> that?
>>>>> >> >>   I am even running into that in 0.11 so any info you can provide
>>>>> >> >> would
>>>>> >> >>   be helpful.
>>>>> >> >
>>>>> >> > I've been testing the code with Python 2.6. I didn't do anything
>>>>> >> > special
>>>>> >> > other than switch the BatchEngineSet to using job arrays (ie a
>>>>> >> > single
>>>>> >> > qsub command instead of N qsubs). Now when I run "ipcluster sge -n
>>>>> >> > 4"
>>>>> >> > the controller starts and the engines are launched and at that
>>>>> >> > point the
>>>>> >> > ipcluster session is running indefinitely. If I then ctrl-c the
>>>>> >> > ipcluster session it catches the signal and calls kill() which
>>>>> >> > terminates the engines by canceling the job. Is this the same
>>>>> >> > situation
>>>>> >> > you're trying to get working?
>>>>> >> >
>>>>> >> >> * For now, let's stick with the assumption of a shared $HOME for
>>>>> >> >> the
>>>>> >> >> furl files.
>>>>> >> >> * The biggest thing is if people can test this thoroughly.  I
>>>>> >> >> don't
>>>>> >> >> have
>>>>> >> >>   SGE/PBS/LSF access right now, so it is a bit difficult for me
>>>>> >> >> to
>>>>> >> >> help. I
>>>>> >> >>   have a cluster coming later in the summer, but it is not here
>>>>> >> >> yet.
>>>>> >> >>  Once
>>>>> >> >>   people have tested it well and are satisfied with it, let's
>>>>> >> >> merge it.
>>>>> >> >> * If we can update the documentation about how the PBS/SGE
>>>>> >> >> support
>>>>> >> >> works
>>>>> >> >>   that would be great.  The file is here:
>>>>> >> >
>>>>> >> > That sounds fine to me. I'm testing this stuff on my workstation's
>>>>> >> > local
>>>>> >> > sge/torque queues and it works fine. I'll also test this with
>>>>> >> > StarCluster and make sure it works on a real cluster. If someone
>>>>> >> > else
>>>>> >> > can test using LSF on a real cluster (with shared $HOME) that'd be
>>>>> >> > great. I'll try to update the docs some time this week.
>>>>> >> >
>>>>> >> >>
>>>>> >> >> Once these small changes have been made and everyone has tested,
>>>>> >> >> me
>>>>> >> >> can merge it for the 0.10.1 release.
>>>>> >> > Excellent :D
>>>>> >> >
>>>>> >> >> Thanks for doing this work Justin and Satra!  It is fantastic!
>>>>> >> >>  Just
>>>>> >> >> so you all know where this is going in 0.11:
>>>>> >> >>
>>>>> >> >> * We are going to get rid of using Twisted in ipcluster.  This
>>>>> >> >> means we
>>>>> >> >> have
>>>>> >> >>   to re-write the process management stuff to use things like
>>>>> >> >> popen.
>>>>> >> >> * We have a new configuration system in 0.11.  This allows users
>>>>> >> >> to
>>>>> >> >> maintain
>>>>> >> >>   cluster profiles that are a set of configuration files for a
>>>>> >> >> particular
>>>>> >> >>   cluster setup.  This makes it easy for a user to have multiple
>>>>> >> >> clusters
>>>>> >> >>   configured, which they can then start by name.  The logging,
>>>>> >> >> security, etc.
>>>>> >> >>   is also different for each cluster profile.
>>>>> >> >> * It will be quite a bit of work to get everything working in
>>>>> >> >> 0.11, so
>>>>> >> >> I am
>>>>> >> >>   glad we are getting good PBS/SGE support in 0.10.1.
>>>>> >> >
>>>>> >> > I'm willing to help out with the PBS/SGE/LSF portion of ipcluster
>>>>> >> > in
>>>>> >> > 0.11, I guess just let me know when is appropriate to start
>>>>> >> > hacking.
>>>>> >> >
>>>>> >> > Thanks!
>>>>> >> >
>>>>> >> > ~Justin
>>>>> >> >
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Brian E. Granger, Ph.D.
>>>>> >> Assistant Professor of Physics
>>>>> >> Cal Poly State University, San Luis Obispo
>>>>> >> bgranger at calpoly.edu
>>>>> >> ellisonbg at gmail.com
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Brian E. Granger, Ph.D.
>>>>> Assistant Professor of Physics
>>>>> Cal Poly State University, San Luis Obispo
>>>>> bgranger at calpoly.edu
>>>>> ellisonbg at gmail.com
>>>>
>>>
>>
>
>



More information about the IPython-dev mailing list