[Python-ideas] asyncore: included batteries don't fit

Josiah Carlson josiah.carlson at gmail.com
Wed Sep 26 19:02:24 CEST 2012


On Wed, Sep 26, 2012 at 1:17 AM, chrysn <chrysn at fsfe.org> wrote:
> On Mon, Sep 24, 2012 at 03:31:37PM -0700, Giampaolo Rodolà wrote:
>> From a chronological standpoint I still think the best thing to do in order
>> to fix the "python async problem" once and for all is to first define and
>> possibly implement an "async WSGI interface" describing what a standard
>> async IO loop/reactor should look like (in terms of API) and how to
>> integrate with it, see:
>> http://mail.python.org/pipermail/python-ideas/2012-May/015223.html
>> http://mail.python.org/pipermail/python-ideas/2012-May/015235.html
>
> i wasn't aware that pep 3153 exists. given that, my original intention
> of this thread should be re-worded into "let's get pep3153 along!".

Go ahead and read PEP 3153, we will wait.

A careful reading of PEP 3153 will tell you that the intent is to make
a "light" version of Twisted built into Python. There isn't any
discussion as to *why* this is a good idea, it just lays out the plan
of action. Its ideas were gathered from the experience of the Twisted
folks.

Their experience is substantial, but in the intervening 1.5+ years
since Pycon 2011, only the barest of abstract interfaces has been
defined (https://github.com/lvh/async-pep/blob/master/async/abstract.py),
and no discussion has taken place as to forward migration of the
(fairly large) body of existing asyncore code.

> i'm not convinced by the api suggested in the first mail, as it sounds
> very unix centric (poll, read/write/error). i rather imagined leaving
> the details of the callbackable/mainloop interaction to be platform
> details. (a win32evtlog event source just couldn't possibly register
> with a select() based main loop). i'd prefer to keep the part that

Of course not, but then again no one would attempt to do as much. They
would use a WSAEvent reactor, because that's the only thing that it
would work with. That said, WSAEvent should arguably be the default on
Windows, so this issue shouldn't even come up there. Also, worrying
about platform-specific details like "what if someone uses a source
that is relatively uncommon on the platform" is a red-herring; get the
interface/api right, build it, and start using it.

To the point, Giampaolo already has a reactor that implements the
interface (more or less "idea #3" from his earlier message), and it's
been used in production (under staggering ftp(s) load). Even better,
it offers effectively transparent replacement of the existing asyncore
loop, and supports existing asyncore-derived classes. It is available:
https://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py

> registers with the a main loop concentrated to a very lowlevel common
> denominator. for unix, that'd mean that there is a basic callbackable
> for "things that receive events because they have a fileno". everything
> above that, eg the distinction whether a "w" event means that we can
> write() or that we must accept() could happen above that and wouldn't
> have to be concerned with the main loop integration any more.
>
> in case (pseudo)code gets the idea over better:
>
> class UnixFilehandle(object):
>     def __init__(self, fileno):
>         self._fileno = fileno
>
>     def register_with_main_loop(self, mainloop):
>         # it might happen that the main loop doesn't support unix
>         # filenos. tough luck, in that case -- the developer should
>         # select a more suitable main loop.
>         mainloop.register_unix_fileno(self._fileno, self)
>
>     def handle_r_event(self): raise NotImplementedError("Not configured to receive that sort of event")
>     # if you're sure you'd never receive any anyway, you can
>     # not-register them by setting them None in the subclass
>     handle_w_event = handle_e_event = handle_r_event
>
> class SocketServer(UnixFilehandle):
>     def __init__(self, socket):
>         self._socket = socket
>         UnixFilehandle.init(socket.fileno())
>
>     def handle_w_event(self):
>         self.handle_accept_event(self.socket.accept())
>
> other interfaces parallel to the file handle interface would, for
> example, handle unix signals. (built atop of that, like the
> accept-handling socket server, could be an that deals with child
> processes.) the interface for android might look different again,
> because there is no main loop and select never gets called by the
> application.

That is, incidentally, what Giampaolo has implemented already. I
encourage you to read the source I linked above.

>> From there the python stdlib *might* grow a new module implementing the
>> "async WSGI interface" (let's call it asyncore2) and some of the stdlib
>> batteries such as socketserver can possibly use it.
>>
>> In my mind this is the ideal long-term scenario but even managing to define
>> an "async WSGI interface" alone would be a big step forward.
>
> i'd welcome such an interface. if asyncore can then be retrofitted to
> accept that interface too w/o breaking compatibility, it'd be nice, but
> if not, it's asyncore2, then.

Easily done, because it's already been done ;)

>> Again, at this point in time what you're proposing looks too vague,
>> ambitious and premature to me.
>
> please don't get me wrong -- i'm not proposing anything for immediate
> action, i just want to start a thinking process towards a better
> integrated stdlib.

I am curious as to what you mean by "a better integrated stdlib". A
new interface that doesn't allow people to easily migrate from an
existing (and long-lived, though flawed) standard library is not
better integration. Better integration requires allowing previous
users to migrate, while encouraging new users to join in with any
later development. That's what Giampaolo's suggested interface offers
on the lowest level; something to handle file-handle reactors,
combined with a scheduler.

>From there; whether layers like Twisted are evolved, or more shallow
layers (like much existing asyncore-derived classes) is yet to be
determined by actual people using it.

> On Mon, Sep 24, 2012 at 05:02:08PM -0700, Josiah Carlson wrote:
>> 1. Whatever reactors are available, you need to be able to instantiate
>> multiple of different types of reactors and multiple instances of the
>> same type of reactor simultaneously (to support multiple threads
>> handling different groups of reactors, or different reactors for
>> different types of objects on certain platforms). While this allows
>> for insanity in the worst-case, we're all consenting adults here, so
>> shouldn't be limited by reactor singletons. There should be a default
>> reactor class, which is defined on module/package import (use the
>> "best" one for the platform).
>
> i think that's already common. with asyncore, you can have different
> maps (just one is installed globally as default). with the gtk main
> loop, it's a little tricky (the gtk.main() function doesn't simply take
> an argument), but the underlying glib can do that afaict.

Remember that a reactor isn't just a dictionary of file handles to do
stuff on, it's the thing that determines what underlying platform
mechanics will be used to multiplex across channels. But that level of
detail will be generally unused by most people, as most people will
only use one at a time. The point of offering multiple reactors is to
allow people to be flexible if they choose (or to pick from the
different reactors if they know that one is faster for their number of
expected handles).

>> 2. The API must be simple. I am not sure that it can get easier than
>> Idea #3 from:
>> http://mail.python.org/pipermail/python-ideas/2012-May/015245.html
>
> it's good that the necessities of call_later and call_every are
> mentioned here, i'd have forgotten about them.
>
> we've talked about many things we'd need in a python asynchronous
> interface (not implementation), so what are the things we *don't* need?
> (so we won't start building a framework like twisted). i'll start:
>
> * high-level protocol handling (can be extra modules atop of it)
> * ssl
> * something like the twisted delayed framework (not sure about that, i
>   guess the twisted people will have good reason to use it, but i don't
>   see compelling reasons for such a thing in a minimal interface from my
>   limited pov)
> * explicit connection handling (retries, timeouts -- would be up to the
>   user as well, eg urllib might want to set up a timeout and retries for
>   asynchronous url requests)

I disagree with the last 3. If you have an IO loop, more often than
not you want an opportunity to do something later in the same context.
This is commonly the case for bandwidth limiting, connection timeouts,
etc., which are otherwise *very* difficult to do at a higher level
(which are the reasons why schedulers are built into IO loops).
Further, SSL in async can be tricky to get right. Having the 20-line
SSL layer as an available class is a good idea, and will save people
time by not having them re-invent it (poorly or incorrectly) every
time.

Regards,
 - Josiah



More information about the Python-ideas mailing list