[Python-Dev] microthreading vs. async io

Thu Feb 15 12:37:14 CET 2007

dustin at v.igoro.us wrote:

[...]
> microtreading:
>   Exploiting language features to use cooperative multitasking in tasks
>   that "read" like they are single-threaded.
>
> asynchronous IO:
>   Performing IO to/from an application in such a way that the
>   application does not wait for any IO operations to complete, but
>   rather polls for or is notified of the readiness of any IO operations.
>
>   
[...]
> Asyncore *only* implements asynchronous IO -- any "tasks" performed in
> its context are the direct result of an IO operation, so it's hard to
> say it implements cooperative multitasking (and Josiah can correct me if
> I'm wrong, but I don't think it intends to).
>
> Much of the discussion here has been about creating a single, unified
> asynchronous IO mechanism that would support *any* kind of cooperative
> multitasking library.  I have opinions on this ($0.02 each, bulk
> discounts available), but I'll keep them to myself for now.
>   
Talking only about async I/O in order to write cooperative tasks that 
"smell" single threaded is to
restricted IMO.

If there are a number of cooperative tasks that "read" single-threaded 
(or sequential) than the goal
is to avoid a _blocking operation_ in any of them because the other 
tasks could do useful things
in the meantime.

But there are a number of different blocking operations, not only async 
IO (which is easily
handled by select()) but also:

- waiting for a child process to exit
- waiting for a posix thread to join()
- waiting for a signal/timer
- ...

Kevent (kernel event) on BSD e.g. tries to provide a common 
infrastructure to provide a file descriptor
where one can push some conditions onto and select() until one of the 
conditions is met. Unfortunately,
thread joining is not covered by it, so one cannot wait (without some 
form of busy looping) until one
of the conditions is true if thread joining is one of them, but for all 
the other cases it would be possible.

There are many other similar approaches (libevent, notify, to name a few).

So in order to avoid blocking in a task, I'd prefer that the task:

- declaratively specifies what kind of conditions (events) it wants to 
wait for. (API)

If that declaration is a function call, then this function could 
implicitely yield if the underlying implementation
would be stackless or greenlet based.

Kevent on BSD systems already has a usable API for defining the 
conditions by structures and there is
also a python module for it.

The important point IMO is to have an agreed API for declaring the 
conditions a task wants to wait for.
The underlying implementation in a scheduler would be free to use 
whatever event library it wants to
use.

E.g. have a wait(events = [], timeout = -1) method would be sufficient 
for most cases, where an event would specify

- resource type (file, process, timer, signal, ...)
- resource id (fd, process id, timer id, signal number, ...)
- filter/flags (when to fire, e.g. writable, readable exception for fd, ...)
- ...

the result could be a list of events that have "fired", more or less 
similar to the events in the argument list,
but with added information on the exact condition.

The task would return from wait(events) when at least 1 of the 
conditions is met. The task then knows e.g.
that an fd is readable and can then do the read() on its own in the way 
it likes to do it, without being forced
to let some uber framework do the low level IO. Just the waiting for 
conditions without blocking the
application is important.

I have implemented something like the above, based on greenlets.

In addition to the event types specified by BSD kevent(2) I've added a 
TASK and CHANNEL resource type
for the events, so that I can wait for tasks to complete or send/receive 
messages to/from other tasks without
blocking the application.

But the implementation is not the important thing, the API is, and then 
we can start writing competing implementations.

Joachim