[Python-ideas] Tulip / PEP 3156 - subprocess events

Tue Jan 22 03:31:41 CET 2013

On Mon, Jan 21, 2013 at 1:23 PM, Ben Darnell <ben at bendarnell.com> wrote:
> On Fri, Jan 18, 2013 at 5:15 PM, Guido van Rossum <guido at python.org> wrote:
>>
>> On Thu, Jan 17, 2013 at 11:17 PM, Greg Ewing
>> <greg.ewing at canterbury.ac.nz> wrote:
>> > Paul Moore wrote:
>> >>
>> >> PS From the PEP, it seems that a protocol must implement the 4 methods
>> >> connection_made, data_received, eof_received and connection_lost. For
>> >> a process, which has 2 output streams involved, a single data_received
>> >> method isn't enough.
>>
>> > It looks like there would have to be at least two Transport instances
>> > involved, one for stdin/stdout and one for stderr.
>> >
>> > Connecting them both to a single Protocol object doesn't seem to be
>> > possible with the framework as defined. You would have to use a
>> > couple of adapter objects to translate the data_received calls into
>> > calls on different methods of another object.
>>
>> So far this makes sense.
>>
>> But for this specific case there's a simpler solution -- require the
>> protocol to support a few extra methods, in particular,
>> err_data_received() and err_eof_received(), which are to stderr what
>> data_received() and eof_received() are for stdout. (After all, the
>> point of a subprocess is that "normal" data goes to stdout.) There's
>> only one input stream to the subprocess, so there's no ambiguity for
>> write(), and neither is there a need for multiple
>> connection_made()/lost() methods. (However, we could argue endlessly
>> over whether connection_lost() should be called when the subprocess
>> exits, or when the other side of all three pipes is closed. :-)

> Using separate methods for stderr breaks compatibility with existing
> Protocols for no good reason (UDP needs a different protocol interface
> because individual datagrams can't be concatenated; that doesn't apply here
> since pipes are stream-oriented).  We'll have intermediate Protocol classes
> like LineReceiver that work with sockets; why should they be reimplemented
> for stderr?

This is a good point.

> It's also likely that if I do care about both stdout and
> stderr, I'm going to take stdout as a blob and redirect it to a file, but
> I'll want to read stderr with a line-oriented protocol to get error
> messages, so I don't think we want to favor stdout over stderr in the
> interface.

That all depends rather on the application.

> I think we should have a pipe-based Transport and the subprocess should just
> contain several of these transports (depending on which fds the caller cares
> about; in my experience I rarely have more than one pipe per subprocess, but
> whether that pipe is stdout or stderr varies).  The process object itself
> should also be able to run a callback when the child exits; waiting for the
> standard streams to close is sufficient in most cases but not always.

Unfortunately you'll also need a separate protocol for each transport,
since the transport calls methods with fixed names on the protocol
(and you've just argued that that we should stick to that -- and I
agree :-). Note that since there's (normally) only one input file to
the subprocess, only one of these transports should have a write()
method -- but both of them have to call data_received() and
potentially eof_received() on different objects.

And in this case it doesn't seem easy to use the StreamReader class,
since you can't know which of the two (stdout or stderr) will have
data available first, and guessing wrong might cause a deadlock. (So,
yes, this is a case where coroutines are less convenient than
callbacks.)

-- 
--Guido van Rossum (python.org/~guido)