[Python-ideas] Async API: some code to review

Mon Oct 29 19:43:57 CET 2012

On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodolà <g.rodola at gmail.com> wrote:
> 2012/10/29 Guido van Rossum <guido at python.org>
>>
>> I'm most interested in feedback on the design of polling.py and
>> scheduling.py, and to a lesser extent on the design of sockets.py;
>> main.py is just an example of how this style works out in practice.
>
> Follows my comments.
>
> === About polling.py ===
>
> 1 - I think DelayedCall should have a reset() method, other than just cancel().

So, essentially an uncancel()? Why not just re-register in that case?
Or what's your use case? (Right now there's no problem in calling one
of these many times -- it's just that cancellation is permanent.)

> 2 - EventLoopMixin should have a call_every() method other than just
> call_later()

Arguably you can emulate that with a simple loop:

def call_every(secs, func, *args):
    while True:
        yield from scheduler.sleep(secs)
        func(*args)

(Flavor to taste to log exceptions, handle cancellation, automatically
spawn a separate task, etc.)

I can build lots of other useful things out of call_soon() and
call_later() -- but I do need at least those two as "axioms".

> 3 - call_later() and call_every() should also take **kwargs other than
> just *args

I just replied to that in a previous message; there's also a comment
in the code. How important is this really? Are there lots of use cases
that require you to pass keyword args? If it's only on occasion you
can use a lambda. (The *args is a compromise so we don't need a lambda
to wrap every callback. But I want to reserve keyword args for future
extensions to the registration functions.)

> 4 - I think PollsterBase should provide a method to modify() the
> events registered for a certain fd (both poll() and epoll() have such
> a method and it's faster compared to un/registering a fd).

Did you see the concrete implementations? Those where this matters
implicitly uses modify() if the required flags change. I can imagine
more optimizations of the implementations (e.g. delaying
register()/modify() calls until poll() is actually called, to avoid
unnecessary churn) without making the API more complex.

> Feel free to take a look at my scheduler implementation which looks
> quite similar to what you've done in polling.py:
> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#85

Thanks, I had seen it previously, I think this also proves that
there's nothing particularly earth-shattering about this design. :-)
I'd love to copy some more of your tricks, e.g. the occasional
re-heapifying. (What usage pattern is this dealing with exactly?) I
should also check that I've taken care of all the various flags and
other details (I recall being quite surprised that with poll(), on
some platforms I need to check for POLLHUP but not on others).

> === About sockets.py ===
>
> 1 - In SocketTransport it seems there's no error handling provisioned
> for send() and recv().
> You should expect these errors
> http://hg.python.org/cpython/file/95931c48a76f/Lib/asyncore.py#l60
> signaling disconnection plus EWOULDBLOCK and EAGAIN for "retry"

Right, I know have been naive about these and have already got a TODO note.

> 2 - SslTransport's send() and recv() methods should suffer the same problem.

Ditto, Antoine told me.

> 3 - I don't fully understand how data transfer works exactly but keep
> in mind that the transport should interact with the pollster.
> What I mean is that generally speaking a connected socket should
> *always* be readable ("r"), even when it's idle, then switch to "rw"
> events when sending data, then get back to "r" when all the data has
> been sent.
> This is *crucial* if you want to achieve high performances/scalability
> and that is why PollsterBase should probably provide a modify()
> method.
> Please take a look at what I've done here:
> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809

Hm. I am not convinced that managing this explicitly from the
transport is the right solution (note that my transports are quite
different from those in Twisted). But I'll keep this in mind -- I
would like to set up a benchmark suite at some point. I will probably
have to implement the server side of HTTP for that purpose, so I can
point e.g. ab at my app.

> ===  Other considerations ===
>
> This 'yield' / 'yield from' approach is new to me (I'm more of a
> "callback guy") so I can't say I fully understand what's going on just
> by reading the code.

Fair enough. You should probably start by reading Greg Ewing's
tutorial -- it's short and sweet:
http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/SimpleScheduler.html

> What I would like to see instead of main.py is a bunch of code samples
> / demos showing how this library is supposed to be used in different
> circumstances.

Agreed, more examples are needed.

> In details I'd like to see at least:
>
> 1 - a client example (connect(), send() a string, recv() a response, close())

Hm, that's all in urlfetch().

> 2 - an echo server example (accept(), recv() string,  send() it back(), close()

Yes, that's missing.

> 3 - how to use a different transport (e.g. UDP)?

I haven't looked into this yet. I expect I'll have to write a
different SocketTransport for this (the existing transports are
implicitly stream-oriented) but I know that the scheduler and
eventloop implementation can handle this fine.

> 4 - how to run long running tasks in a thread?

That's implemented. Check out call_in_thread(). Note that you can pass
it an alternate threadpool (executor).

> Also:
>
> 5 - is it possible to use multiple "reactors" in different threads?

Should be possible.

> How?  (asyncore for example achieves this by providing a separate
> 'map' argument for both the 'reactor' and the dispatchers)

It works by making the Context class use thread-local storage (TLS).

> I understand you just started with this so I'm probably asking too
> much at this point in time.
> Feel free to consider this a kind of a "long term review".

You have asked many useful questions already. Since you have
implemented a real-world I/O loop yourself, your input is extremely
valuable. Thanks, and keep at it!

-- 
--Guido van Rossum (python.org/~guido)