[Python-ideas] Tulip / PEP 3156 - subprocess events

Sun Jan 20 05:35:04 CET 2013

On Sat, Jan 19, 2013 at 5:51 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> But the trade-off in separating protocol creation from notification of
> the connection is that it means every *protocol* has to be written to
> handle the "no connection yet" gap between __init__ and the call to
> connection_made.

That doesn't strike me as a problematic design. I've seen it plenty of times.

> However, if we instead delay the call to the protocol factory until
> *after the connection is made*, then most protocols can be written
> assuming they always have a connection (at least until connection_lost
> is called). A persistent protocol that spanned multiple
> connect/reconnect cycles could be written such that you passed
> "my_protocol.connection_made" as the protocol factory, while normal
> protocols (that last only the length of a single connection) would
> pass "MyProtocol" directly.

Well, almost. connection_made() would have to return self to make this
work. But we could certainly use add some other method that did that.

(At first I thought it would be harder to pass other parameters to the
constructor for the non-reconnecting case, but the solution is about
the same as before -- use a partial function or a lambda that takes a
protocol and calls the constructor with that and whatever other
parameters it wants to pass.)

> At the transport layer, the two states "has a protocol" and "has a
> connection" could then be collapsed into one - if there is a
> connection, then there will be a protocol, and vice-versa. This
> differs from the current status in PEP 3156, where it's possible for a
> transport to have a protocol without a connection if it calls the
> protocol factory well before calling connection_made.

This doesn't strike me as important. The code I've written for Tulip
puts most of the connection-making code outside the transport, and the
transport constructor is completely private. Every transport
implementation is completely free in how it works, and every event
loop implementation is free to put as much or as little of the
connection set-up in the transport as it wants to. The same is true
for transports written by users (and there will be some of these). The
*only* things we care about for transports is that the thing passed to
the protocol's connection_made() has the methods specified by the PEP
(write(), writelines(), pause(), resume(), and a few more). Also, it
does not matter one iota whether it is the transport or some other
entity that calls the protocol's methods (connection_made(),
data_received(), etc.) -- the only thing that matters is the order in
which they are called.

IOW, even though a transport may "have" a protocol without a
connection, nobody should care about that state, and nobody should be
calling its methods (again, write() etc.) in that state. In fact,
nobody except event loop internal code should ever have a reference to
a transport in that state. (The transport that is returned by
create_connection() is fully connected to the socket (or whatever
might takes its place) as well as to the protocol.)

I think we can make the same assumptions for transports implemented by
user code.

> Now, it may be that *there's a good reason* why conflating "has a
> protocol" and "has a connection" at the transport layer is a bad idea,
> and thus we actually *need* the "protocol creation" and "protocol
> association with a connection" events to be distinct. However, the PEP
> currently doesn't explain *why* it's necessary to separate the two,
> hence the confusion for at least Greg, Ben and myself.

So, your whole point here seems to be that you'd rather see the PEP
specify that the sequence when a connection is made is

  protocol = protocol_factory(transport)

rather than

  protocol = protocol_factory()
  protocol.connection_made(transport)

I looked in the Tulip code to see whether this would cause any
problems. I think it could be done, but the solution would feel a
little awkward to me, because currently the protocol's
connection_made() method is not called directly by the transport: it
is called indirectly via the event loop's call_soon() method. So using
your approach the transport wouldn't have a protocol attribute until
this callback is called -- or we'd have to change things to call it
directly rather than via call_soon(). Now I'm pretty sure I can prove
that nothing will be referencing the protocol *before* the
connection_made() call is actually made, and also that directly
calling it instead of using call_soon() is fine. But nevertheless the
transport code would feel a little harder to reason about.

> Given that new protocol implementations should be significantly more
> common than new transport implementations, there's a strong case to be
> made for pushing any required complexity into the transports.

TBH I don't see the protocol implementation getting any simpler
because of this. There is some protocol initialization code that
doesn't depend on the transport, and some that does. Using your
approach, these all go in __init__(). Using the PEP's current
proposal, the latter go in a separate method, connection_made(). But
using your approach, writing the lambda or partial function that calls
the constructor with the right arguments (to be passed as
protocol_factory) becomes a tad more complex, since now it must take a
transport argument. On the third hand, rigging things so that a
pre-existing protocol instance can be reused becomes a little harder
to figure out, since you have to write a helper method that takes a
transport and returns the protocol (i.e., self).

All in all I see it as six of one, half a dozen of the other, and I am
happy with Glyph's testimony that the Twisted design works well in
practice.

-- 
--Guido van Rossum (python.org/~guido)