[Python-ideas] Async API: some code to review

Tue Oct 30 18:34:12 CET 2012

On Tue, Oct 30, 2012 at 3:12 AM, Laurens Van Houtven <_ at lvh.cc> wrote:
> I've been following the PEP380-related threads and I've reviewed this stuff,
> while trying to do the protocols/transports PEP, and trying to glue the two
> together.

Thanks! I know it can't be easy to keep up with all the threads (and
now code repos).

> The biggest difference I can see is that protocols as they've been discussed
> are "pull": they get called when some data arrives. They don't know how much
> data there is; they just get told "here's some data". The obvious difference
> with the API in, eg:
>
> https://code.google.com/p/tulip/source/browse/sockets.py#56
>
> ... is that now I have to tell a socket to read n bytes, which "blocks" the
> coroutine, then I get some data.

Yes. But do note that sockets.py is mostly a throw-away example
written to support the only style I am familiar with -- synchronous
reads and writes. My point in writing this particular set of
transports is that I want to take existing synchronous code (e.g. a
threaded server built using the stdlib's
socketserver.ThreadingTCPServer class) and make minimal changes to the
protocol logic to support async operation -- those minimal changes
should boil down to using a different way to set up a connection or a
listening socket or constructing a stream from a socket, and putting
"yield from" in front of the blocking operations (recv(), send(), and
the read/readline/write operations on the streams.

I'm still looking for guidance from Twisted and Tornado (and you!) to
come up with better abstractions for transports and protocols. The
underlying event loop *does* support a style where an object registers
a callback function once which is called repeatedly, as long as the
socket is readable (or writable, depending on the registration call).

> Now, there doesn't have to be an issue; you could simply say:
>
> data = yield from s.recv(4096) # that's the magic number usually right
> proto.data_received(4096)

(Off-topic: ages ago I determined that the optimal block size is
actually 8192. But for all I know it is 256K these days. :-)

> It seems a bit boilerplatey, but I suppose that eventually could be hidden
> away.
>
> But this style is pervasive, for example that's how reading by lines works:
>
> https://code.google.com/p/tulip/source/browse/echosvr.py#20

Right -- again, this is all geared towards making it palatable for
people used to write synchronous code (either single-threaded or
multi-threaded), not for people used to Twisted.

> While I'm not a big fan (I may be convinced if I see a protocol test that
> looks nice);

Check out urlfetch() in main.py:
http://code.google.com/p/tulip/source/browse/main.py#39

For sure, this isn't "pretty" and it should be rewritten using more
abstraction -- I only wrote the entire thing as a single function
because I was focused on the scheduler and event loop. And it is
clearly missing a buffering layer for writing (it currently uses a
separate send() call for each line of the HTTP headers, blech). But it
implements a fairly complex (?) protocol and it performs well enough.

> I'm just wondering if there's any point in trying to write the
> pull-style protocols when this works quite differently.

Perhaps you could try to write some pull-style transports and
protocols for tulip to see if anything's missing from the scheduler
and eventloop APIs or implementations? I'd be happy to rename
sockets.py to push_sockets.py so there's room for a competing
pull_sockets.py, and then we can compare apples to apples.

(Unlike the yield vs. yield-from issue, where I am very biased, I am
not biased about push vs. pull style. I just coded up what I was most
familiar with first.)

> Additionally, I'm not sure if readline belongs on the socket.

It isn't -- it is on the BufferedReader, which wraps around the socket
(or other socket-like transport, like SSL). This is similar to the way
the stdlib socket.socket class has a makefile() method that returns a
stream wrapping the socket.

> I understand the simile with files, though.

Right, that's where I've gotten most of my inspiration. I figure they
are a good model to lure unsuspecting regular Python users in. :-)

> With the coroutine style I could see how the
> most obvious fit would be something like tornado's read_until, or an
> as_lines that essentially calls read_until repeatedly. Can the delimiter for
> this be modified?

You can write your own BufferedReader, and if this is a common pattern
we can make it a standard API. Unlike the SocketTransport and
SslTransport classes, which contain various I/O hacks and integrate
tightly with the polling capability of the eventloop, I consider
BufferedReader plain user code. Antoine also hinted that with not too
many changes we could reuse the existing buffering classes in the
stdlib io module, which are implemented in C.

> My main syntactic gripe is that when I write @inlineCallbacks code or
> monocle code or whatever, when I say "yield" I'm yielding to the reactor.
> That makes sense to me (I realize natural language arguments don't always
> make sense in a programming language context). "yield from" less so (but
> okay, that's what it has to look like). But this just seems weird to me:
>
> yield from trans.send(line.upper())
>
> Not only do I not understand why I'm yielding there in the first place (I
> don't have to wait for anything, I just want to push some data out!), it
> feels like all of my yields have been replaced with yield froms for no
> obvious reason (well, there are reasons, I'm just trying to look at this
> naively).

Are you talking about yield vs. yield-from here, or about the need to
suspend every write? Regarding yield vs. yield-from, please squint and
get used to seeing yield-from everywhere -- the scheduler
implementation becomes *much* simpler and *much* faster using
yield-from, so much so that there really is no competition.

As to why you would have to suspend each time you call send(), that's
mostly just an artefact of the incomplete example -- I didn't
implement a BufferedWriter yet. I also have some worries about a task
producing data at a rate faster than the socket can drain it from the
buffer, but in practice I would probably relent and implement a
write() call that returns immediately and should *not* be used with
yield-from. (Unfortunately you can't have a call that works with or
without yield-from.) I think there's a throttling mechanism in Twisted
that can probably be copied here.

> I guess Twisted gets away with this because of deferred chaining: that one
> deferred might have tons of callbacks in the background, many of which also
> doing IO operations, resulting in a sequence of asynchronous operations that
> only at the end cause the generator to be run some more.
>
> I guess that belongs in a different thread, though. Even, then, I'm not sure
> if I'm uncomfortable because I'm seeing something different from what I'm
> used to, or if my argument from English actually makes any sense whatsoever.
>
> Speaking of protocol tests, what would those look like? How do I yell, say,
> "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock
> transport, and call the handler with that? (I realize it's early days to be
> thinking that far ahead; I'm just trying to figure out how I can contribute
> a good protocol definition to all of this).

Actually I think the ease of writing tests should definitely be taken
into account when designing the APIs here. In the Zope world, Jim
Fulton wrote a simple abstraction for networking code that explicitly
provides for testing: http://packages.python.org/zc.ngi/ (it also
supports yield-style callbacks, similar to Twisted's inlineCallbacks).

I currently don't have any tests, apart from manually running main.py
and checking its output. I am a bit hesitant to add unit tests in this
early stage, because keeping the tests passing inevitably slows down
the process of ripping apart the API and rebuilding it in a different
way -- something I do at least once a day, whenever I get feedback or
a clever thought strikes me or something annoying reaches my trigger
level.

But I should probably write at least *some* tests, I'm sure it will be
enlightening and I will end up changing the APIs to make testing
easier. It's in the TODO.

-- 
--Guido van Rossum (python.org/~guido)