[Python-Dev] Pythonic concurrency - cooperative MT

Mon Oct 3 07:53:52 CEST 2005

On 10/2/05, Christopher Armstrong <radeex at gmail.com> wrote:
> On 10/3/05, Martin Blais <blais at furius.ca> wrote:
> > On 10/1/05, Antoine <solipsis at pitrou.net> wrote:
> > >
> > > > like this with their "deferred objects", no?  I figure they would
> > > > need to do something like this too.  I will have to check.)
> > >
> > > A Deferred object is just the abstraction of a callback - or, rather, two
> > > callbacks: one for success and one for failure. Twisted is architected
> > > around an event loop, which calls your code back when a registered event
> > > happens (for example when an operation is finished, or when some data
> > > arrives on the wire). Compared to generators, it is a different way of
> > > expressing cooperative multi-threading.
> >
> > So, the question is, in Twisted, if I want to defer on an operation
> > that is going to block, say I'm making a call to run a database query
> > that I'm expecting will take much time, and want to yield ("defer")
> > for other events to be processed while the query is executed, how do I
> > do that?  As far as I remember the Twisted docs I read a long time ago
> > did not provide a solution for that.
>
> Deferreds don't make blocking code non-blocking; they're just a way to
> make it nicer to write non-blocking code. There are utilities in
> Twisted for wrapping a blocking function call in a thread and having
> the result returned in a Deferred, though (see deferToThread). There
> is also a lightweight and complete wrapper for DB-API2 database
> modules in twisted.enterprise.adbapi, which does the threading
> interaction for you.
>
> So, since this then exposes a non-blocking API, you can do stuff like
>
> d = pool.runQuery('SELECT User_ID FROM Users')
> d.addCallback(gotDBData)
> d2 = ldapfoo.getUser('bob')
> d2.addCallback(gotLDAPData)
>
> And both the database call and the ldap request will be worked on concurrently.

Very nice!

However, if you're using a thread to do just that, it's just using a
part of what threads were designed for: it's really just using the
low-level kernel knowledge about resource access and when they become
ready to wait on the resource, since you're not going to run much
actual code in the thread itself (apart from setting up to do the
blocking call and returning its value).

Now, if we had something in the language that allows us to do
something like that--make the most important potentially blocking
calls asynchronously-- we could implement a more complete scheduler
that could really leverage generators to create a more interesting
concurrency solution with less overhead.  For example, imagine that
some class of generators are used as tasks, like we were discussing
before.  When you would call the special yield_read() call (a
variation on e.g. os.read() call), there is an implicit yield that
allows other generators which are ready to run until the data is
available, without the overhead of

1. context switching to the helper threads and back;
2. synchronization for communcation with the helper threads (I assume
threads would not be created dynamically, for efficiency.  I imagine
there is a pool of helpers waiting to do the async call jobs, and
communication with them to dispatch the call jobs does not come for
free (i.e. locking)).

We really don't need threads at all to do that (at least for the
common blocking calls), just some low-level support for building a
scheduler.  Using threads to do that has a cost, it is more or less a
kludge, in that context (but we have nothing better for now).

cheers,