[Python-ideas] Async API: some code to review

Giampaolo Rodolà g.rodola at gmail.com
Mon Oct 29 22:20:44 CET 2012


2012/10/29 Guido van Rossum <guido at python.org>:
> On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodolà <g.rodola at gmail.com> wrote:
>> 2012/10/29 Guido van Rossum <guido at python.org>
>> === About polling.py ===
>>
>> 1 - I think DelayedCall should have a reset() method, other than just cancel().
>
> So, essentially an uncancel()? Why not just re-register in that case?
> Or what's your use case? (Right now there's no problem in calling one
> of these many times -- it's just that cancellation is permanent.)

The most common use case is when you want to disconnect the other peer
after a certain time of inactivity.
Ideally what you would do is schedule() a idle/timeout function and
reset() it every time the other peer sends you some data.


>> 2 - EventLoopMixin should have a call_every() method other than just
>> call_later()
>
> Arguably you can emulate that with a simple loop:
>
> def call_every(secs, func, *args):
>     while True:
>         yield from scheduler.sleep(secs)
>         func(*args)
>
> (Flavor to taste to log exceptions, handle cancellation, automatically
> spawn a separate task, etc.)
>
> I can build lots of other useful things out of call_soon() and
> call_later() -- but I do need at least those two as "axioms".

Agreed.


>> 3 - call_later() and call_every() should also take **kwargs other than
>> just *args
>
> I just replied to that in a previous message; there's also a comment
> in the code. How important is this really? Are there lots of use cases
> that require you to pass keyword args? If it's only on occasion you
> can use a lambda. (The *args is a compromise so we don't need a lambda
> to wrap every callback. But I want to reserve keyword args for future
> extensions to the registration functions.)

It's not crucial to have kwargs, just nice, but I understand your
motives to rule them out, in fact I reserved two kwarg names
('_errback' and '_scheduler') for the same reason.
In my experience I learned that passing an extra error handler
function (what Twisted calls 'errrback') can be desirable, so that's
another thing you might want to consider.
In my scheduler implementation I achieved that by passing an _errback
keyword parameter, like this:

>>> ioloop.call_later(30, callback, _errback=err_callback)

Not very nice to use a reserved keyword, I agree.
Perhaps you can keep ruling out kwargs referred to the callback
function and change the current call_later signature as such:

-    def call_later(self, when, callback, *args):
+    def call_later(self, when, callback, *args, errback=None):

...or maybe provide a DelayedCall.add_errback() method a-la Twisted.

> Thanks, I had seen it previously, I think this also proves that
> there's nothing particularly earth-shattering about this design. :-)
> I'd love to copy some more of your tricks,

Sure, go on. It's MIT licensed code.


> e.g. the occasional re-heapifying. (What usage pattern is this
> dealing with exactly?)

It's intended to avoid making the list grow with too many cancelled functions.
Imagine this use case:

WEEK = 60 x 60 x 24 x 7
for x in xrange(1000000):
    f = call_later(WEEK, fun)
    f.cancel()

You'll end up having a heap with milions of cancelled items which will
be freed after a week.
Instead you can keep track of the number of cancelled functions every
time cancel() is called and re-heapify the list when that number gets
too high:
http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#122


> should also check that I've taken care of all the various flags and
> other details (I recall being quite surprised that with poll(), on
> some platforms I need to check for POLLHUP but not on others).

Yeah, that's a painful part.
Try to look here:
http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#464
Instead of handle_close()ing you should add the fd to the list of
readable ones ("r").
The call to recv() which will be coming next will then cause the
socket to close (you have to add the error handling to recv() first
though).


>> 3 - I don't fully understand how data transfer works exactly but keep
>> in mind that the transport should interact with the pollster.
>> What I mean is that generally speaking a connected socket should
>> *always* be readable ("r"), even when it's idle, then switch to "rw"
>> events when sending data, then get back to "r" when all the data has
>> been sent.
>> This is *crucial* if you want to achieve high performances/scalability
>> and that is why PollsterBase should probably provide a modify()
>> method.
>> Please take a look at what I've done here:
>> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809
>
> Hm. I am not convinced that managing this explicitly from the
> transport is the right solution (note that my transports are quite
> different from those in Twisted). But I'll keep this in mind -- I
> would like to set up a benchmark suite at some point. I will probably
> have to implement the server side of HTTP for that purpose, so I can
> point e.g. ab at my app.

I think you might want to apply that to something slighlty higher
level than the mere transport.
Something like the equivalent of asynchat.push /
asynchat.push_with_producer, if you'll ever want to go that far in
terms of abstraction, or maybe avoid that at all but make it clear in
the doc that the user should take care of that.
My point is that having a socket registered for both "r" AND "w"
events when in fact you want only "r" OR "w" is an exponential waste
of CPU cycles and it should be avoided either by the lib or by the
user.
"old select() implementation" vs "new select() implementation"
benchmark shown here reflects exactly this problem which still affects
base asyncore module:
https://code.google.com/p/pyftpdlib/issues/detail?id=203#c6


I'll keep following the progress on this and hopefully come up with
another set of questions and/or random thoughts.


--- Giampaolo
https://code.google.com/p/pyftpdlib/
https://code.google.com/p/psutil/
https://code.google.com/p/pysendfile/



More information about the Python-ideas mailing list