Clueless: piping between 2 non-python processes

Sun Oct 26 01:00:17 EST 2003

On Sun, Oct 26, 2003 at 05:08:44AM -0000, Donn Cave wrote:
> Quoth Andrew Bennetts <andrew-pythonlist at puzzling.org>:
> | On Sat, Oct 25, 2003 at 08:16:33PM +0200, Michael Lehmeier wrote:
> ...
> | > So the basic problem is:
> | > - create two processes A and B
> | > - pipe from A to B
> | > - terminate A when B ends
> |
> | I think you can do something like (untested):
> 
> Well, this is really more than he asked for.  All we need here is 
> the same thing as the shell does with 'a | b', but instead of forking
> a from b as the shell would normally do it, both processes need to
> be children of the Python program so that it can wait for either.
> [More comments interleaved.]
> 
> |     import os, signal
> |     
> |     pathToA = '/usr/bin/A'
> |     pathToB = '/usr/bin/B'
> |     
> |     # Create pipes
> |     readA, writeB = os.pipe()
> |     readB, writeA = os.pipe()
> 
> ... We only need one pipe here, the second one.

For his application, where the data is only flowing one-way, that's true.  I
was being unnecessarily general here.

> |     # Create process A
> |     pidA = os.fork()
> |     if pidA == 0:   # child
> |         os.dup2(readA.fileno(), 0)  # set read pipe to stdin
> ... omit the above line.
> |         os.dup2(writeA.fileno(), 1) # set write pipe to stdout
> ... readA and writeA will be integer unit numbers, so omit ".fileno()"

Oops, yes.  In my haste I assumed os.pipe() returned file objects rather
than raw file descriptors.

Also, ideally you'd close all other file descriptors in the child process
apart from these pipes, to avoid allowing the child processes to muck with
files that the parent has open.

> 
> |         os.execl(pathToA)
> ... os.execl(pathToA, pathToA)

Ah, good point.

> ... Better enclose the whole child fork's Python code in try/finally,
> ... with os._exit(113) in the finally block (where 113 is some distinctive
> ... number.)  Otherwise exceptions will branch back out of this block into
> ... code that you intended for the parent.

Yeah, that's safer, although I can't think of a reason why an exception
would be raised here (but better safe than sorry).

For that matter, a call to "sys.settrace(None)" in the child is probably a
good idea, in case he ever tries to step through the code with pdb...

> |     # Wait for B to terminate
> |     os.waitpid(pidB, 0)
> 
> OK, this is where the fun starts.  I understood the problem to be
> that B may fail to exit when A exits, so I think we really want
> to wait for A.  Or if the converse may also happen, then we need
> to wait for either and then dispatch the other.  In any case I think
> this is not too hard to figure out.  The most useful trick here is
> the os.WNOHANG flag to waitpid, which will allow one or more waits
> without blocking to see if B is really going to exit on its own.
> You don't want to kill it unless you're fairly sure it's necessary,
> assuming it's doing something useful enough to justify running it
> in the first place.  kill should also be surrounded with try/except,
> because on some platforms it may be an error to kill a process that
> has exited even if it still hasn't been reaped.

Well, if these are the only child processes his program spawns, he can
afford to just use os.wait, e.g. something like:

    pid, status = os.wait()
    if pid == pidA:
        otherpid = pidB
    elif pid == pidB:
        otherpid = pidA
    else:
        assert 0, "This isn't supposed to happen"

    try:
        os.kill(pidB, signal.SIGTERM)
    except OSError:
        # Already dead, it seems
        pass

    # Make sure to reap both children
    os.waitpid(otherpid, 0)  

But polling using the os.WNOHANG flag would work too, although polling
always feels less elegant to me.

> | I've probably stuffed up some details, but I'm pretty sure that that's the
> | basic idea.
> 
> Well, it's not much worse than the one that proposed a thread.

:)

> The bi-directional pipes you set up there can be a good thing, in
> a case where that's what you need, but even then they're extremely
> brittle, because pipes have a fixed, limited buffer size and because
> C library I/O (including Python's fileobject) employs process internal
> block buffering when writing to pipes.  The former means the pipe can
> fill up when the reading process is dilatory, the latter means the
> pipe may be empty at a point where the writing process has logically
> written to it.  Often enough you find both conditions together.

If buffering is a problem, the processes comminicating via pipes are welcome
to call fflush() or change their I/O library's buffer settings as needed.
This isn't significantly different to the problems you can encounter with
TCP sockets, unless I'm misunderstanding you.

-Andrew.