[Python-ideas] Async API: some code to review

Guido van Rossum guido at python.org
Mon Oct 29 18:03:00 CET 2012


On Mon, Oct 29, 2012 at 9:07 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Le Sun, 28 Oct 2012 16:52:02 -0700,
> Guido van Rossum <guido at python.org> a écrit :
>> The event list started out as a tuple of (fd, flag, callback, args),
>> where flag is 'r' or 'w' (easily extensible); in practice neither the
>> fd nor the flag are used, and one of the last things I did was to wrap
>> callback and args into a simple object that allows cancelling the
>> callback; the add_*() methods return this object. (This could probably
>> use a little more abstraction.) Note that poll() doesn't call the
>> callbacks -- that's up to the event loop.
>
> I don't understand why the pollster takes callback objects if it never
> calls them. Also the fact that it wraps them into DelayedCalls is more
> mysterious to me. DelayedCalls represent one-time cancellable callbacks
> with a given deadline, not callbacks which are called any number of
> times on I/O events and that you can't cancel.

Yeah, this part definitely needs reworking. In the current design the
pollster is a base class of the eventloop, and the latter *does* call
them; but I want to refactor that anyway. I'll probably end up with a
pollster that registers (what are to it) opaque tokens and returns
just a list of tokens from poll(). (Unrelated: would it be useful if
poll() was an iterator?)

>> scheduling.py:
>> http://code.google.com/p/tulip/source/browse/scheduling.py
>>
>> This is the scheduler for PEP-380 style coroutines. I started with a
>> Scheduler class and operations along the lines of Greg Ewing's design,
>> with a Scheduler instance as a global variable, but ended up ripping
>> it out in favor of a Task object that represents a single stack of
>> generators chained via yield-from. There is a Context object holding
>> the event loop and the current task in thread-local storage, so that
>> multiple threads can (and must) have independent event loops.
>
> YMMV, but I tend to be wary of implicit thread-local storage. What if
> someone runs a function or method depending on that thread-local
> storage from inside a thread pool? Weird bugs ensue.

Agreed, I had to figure out one of these in the implementation of
call_in_thread() and it wasn't fun.

I don't know what else to do -- I think it's probably best if I base
my implementation on this for now so that I know it works correctly in
such an environment. In the end there will probably be an API to get
the current context and another to influence how that API gets it, so
people can plug in their own schemes, from TLS to a simple global to
something determined by an external library.

> I think explicit context is much less error-prone. Even a single global
> instance (like Twisted's reactor) would be better :-)

I find that passing the context around everywhere makes for awkward APIs though.

> As for the rest of the scheduling module, I can't say much since I have
> a hard time reading and understanding it.

That's a problem, I need to write this up properly so that everyone
can understand it.

>> To invoke a primitive I/O operation, you call the current task's
>> block() method and then immediately yield (similar to Greg Ewing's
>> approach). There are helpers block_r() and block_w() that arrange for
>> a task to block until a file descriptor is ready for reading/writing.
>> Examples of their use are in sockets.py.
>
> That's weird and kindof ugly IMHO. Why would you write:
>
>         scheduling.block_w(self.sock.fileno())
>         yield
>
> instead of say:
>
>         yield scheduling.block_w(self.sock.fileno())
>
> ?

This has been debated at nauseam already (be glad you missed it);
basically, there's not a whole lot of difference but if there are some
APIs that require "yield X(args)" and others that require "yield from
Y(args)" that's really confusing. The "bare yield only" makes it
possible (though I didn't implement it here) to put some strict checks
in the scheduler -- next() should never return anything except None.
But there are other ways to do that too.

Anyway, I probably will change the API so that e.g. sockets.py doesn't
have to use this paradigm; I'll just wrap these low-level APIs in a
proper "coroutine" and then sockets.py can just use "yield from
block_r(fd)". (This is one reason why I like the "bare generators with
yield from" approach that Greg Ewing and PEP 380 recommend: it's
really cheap to wrap an API in an extra layer of yield-from. (See the
yyftime.py benchmark I added to the tulip drectory.)

> Also, the fact that each call to SocketTransport.{recv,send} explicitly
> registers then removes the fd on the event loop looks wasteful.

I am hoping to add some optimization for this -- I am actually
planning a hackathon (or re-education session :-) with some Twisted
folks where I hope they'll explain to me how they do this.

> By the way, even when a fd is signalled ready, you must still be
> prepared for recv() to return EAGAIN (see
> http://bugs.python.org/issue9090).

Yeah, I should know, I ran into this for a Google project too (there
was a kernel driver that was lying...). I had a cryptic remark in my
post above referring to this.

>> In the docstrings I use the prefix "COROUTINE:" to indicate public
>> APIs that should be invoked using yield from.
>
> Hmm, should they? Your approach looks a bit weird: you have functions
> that should use yield, and others that should use "yield from"? That
> sounds confusing to me.

Yeah, see above.

> I'd much rather either have all functions use "yield", or have all
> functions use "yield from".

Agreed, and I'm strongly in favor of "yield from". The block_r() +
yield is considered an *internal* API.

> (also, I wouldn't be shocked if coroutines had to wear a special
> decorator; it's a better marker than having the word COROUTINE in the
> docstring, anyway :-))

Agreed it would be useful as documentation, and maybe an API can use
this to enforce proper coding style. It would have to be purely
decoration though -- I don't want an extra layer of wrapping to occur
each time you call a coroutine. (I.e. the decorator should just return
"func".)

>> sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py
>>
>> This implements some internet primitives using the APIs in
>> scheduling.py (including block_r() and block_w()). I call them
>> transports but they are different from transports Twisted; they are
>> closer to idealized sockets. SocketTransport wraps a plain socket,
>> offering recv() and send() methods that must be invoked using yield
>> from. SslTransport wraps an ssl socket (luckily in Python 2.6 and up,
>> stdlib ssl sockets have good async support!).
>
> SslTransport.{recv,send} need the same kind of logic as do_handshake():
> catch both SSLWantReadError and SSLWantWriteError, and call block_r /
> block_w accordingly.

Oh... Thanks for the tip. I didn't find this in the ssl module docs.

>> Then there is a
>> BufferedReader class that implements more traditional read() and
>> readline() coroutines (i.e., to be invoked using yield from), the
>> latter handy for line-oriented transports.
>
> Well... It would be nice if BufferedReader could re-use the actual
> io.BufferedReader and its fast readline(), read(), readinto()
> implementations.

Agreed, I would love that too, but the problem is, *this*
BufferedReader defines methods you have to invoke with yield from.
Maybe we can come up with a solution for sharing code by modifying the
_io module though; that would be great! (I've also been thinking of
layering TextIOWrapper on top of these.)

Thanks for the thorough review!

-- 
--Guido van Rossum (python.org/~guido)



More information about the Python-ideas mailing list