A bit of a boggle about subprocess.poll() and the codes it receives from a process

Sat Sep 10 05:19:08 EDT 2011

On Fri, Sep 9, 2011 at 11:02 PM, J <dreadpiratejeff at gmail.com> wrote:
> Hi,
> I need a bit of help sorting this out...
> I have a memory test script that is a bit of compiled C.  The test itself
> can only ever return a 0 or 1 exit code, this is explicitly coded and there
> are no other options.
> I also have a wrapper test script that calls the C program that should also
> only return 0 or 1 on completion.
> The problem i'm encountering, however, involves the return code when
> subprocess.poll() is called against the running memory test process.  The
> current code in my wrapper program looks like this:
> def run_processes(self, number, command):
>         passed = True
>         pipe = []
>         for i in range(number):
>             pipe.append(self._command(command))
>             print "Started: process %u pid %u: %s" % (i, pipe[i].pid,
> command)
>         sys.stdout.flush()
>         waiting = True
>         while waiting:
>             waiting = False
>             for i in range(number):
>                 if pipe[i]:
>                     line = pipe[i].communicate()[0]
>                     if line and len(line) > 1:
>                         print "process %u pid %u: %s" % (i, pipe[i].pid,
> line)
>                         sys.stdout.flush()
>                     if pipe[i].poll() == -1:
>                         waiting = True
>                     else:
>                         return_value = pipe[i].poll()
>                         if return_value != 0:
>                             print "Error: process  %u pid %u retuned %u" %
> (i, pipe[i].pid, return_value)
>                             passed = False
>                         print "process %u pid %u returned success" % (i,
> pipe[i].pid)
>                         pipe[i] = None
>         sys.stdout.flush()
>         return passed
> So what happens here is that in the waiting loop, if pipe[i].poll returns a
> -1, we keep waiting, and then if it returns anything OTHER than -1, we exit
> and return the return code.

Does self._command return a subprocess.Popen object?  The
documentation at
http://docs.python.org/library/subprocess.html#subprocess.Popen.poll
says that the poll method sets and returns the returncode attribute.
returncode is expected to be None until the process terminates, and a
negative value indicates that the subprocess has been terminated
because of a signal.  If poll() returns -1, it means that the process
has been terminated by signal number 1 (probably SIGHUP).

> BUT, I'm getting, in some cases, a return code of 127, which is impossible
> to get from the memory test program.
> The output from this bit of code looks like this in a failing situation:
> Error: process 0 pid 2187 retuned 127
> process 0 pid 2187 returned success
> Error: process 1 pid 2188 retuned 127
> process 1 pid 2188 returned success
> I'm thinking that I'm hitting some sort of race here where the kernel is
> reporting -1 while the process is running, then returns 127 or some other
> status when the process is being killed and then finally 0 or 1 after the
> process has completely closed out.  I "think" that the poll picks up this
> intermediate exit status and immediately exits the loop, instead of waiting
> for a 0 or 1.
> I've got a modified version that I'm getting someone to test for me now that
> changes
>  if pipe[i].poll() == -1:
>      waiting = True
> to this
> if pipe[i].poll() not in [0,1]:
>     waiting = True
> So my real question is: am I on the right track here, and am I correct in my
> guess that the kernel is reporting different status codes to
> subprocess.poll() during the shutdown of the polled process?
>

I'm unaware of any such race condition.  It is more likely that the
program you are running can return such error codes.  Perhaps you
should examine its stderr output.

It's unclear what you're trying to do here.  If you'd just like to
wait until all the started processes are finished, you should use the
wait() method instead of the poll() method.  You should also note that
the communicate() method already waits for the process to complete, so
further calls to poll() and wait() are superfluous.  Normally, after
communicate() returns, you would simply check pipe[i].returncode and
be on your way.

-- 
regards,
kushal