[Python-Dev] microthreading vs. async io

dustin at v.igoro.us dustin at v.igoro.us
Thu Feb 15 17:36:21 CET 2007


On Thu, Feb 15, 2007 at 04:51:30PM +0100, Joachim K?nig-Baltes wrote:
> The style used in asyncore, inheriting from a class and calling return 
> in a method
> and being called later at a different location (different method) just 
> interrupts the
> sequential flow of operations and makes it harder to understand. The same is
> true for all other strategies using callbacks or similar mechanisms.
> 
> All this can be achieved with a multilevel yield() that is hidden in a 
> function call.
> So the task does a small step down (wait) in order to jump up (yield) to 
> the scheduler
> without disturbing the eye of the beholder.

I agree -- I find that writing continuations or using asyncore's
structure makes spaghetti out of functionality that requires multiple
blocking operations inside looping or conditional statements.  The best
example, for me, was writing a complex site-specific web spider that had
to fetch 5-10 pages in a certain sequence, where each step in that
sequence depended on the results of the previous fetches.  I wrote it in
Twisted, but the proliferation of nested callback functions and chained
deferreds made my head explode while trying to debug it.  With a decent
microthreading library, that could look like:

def fetchSequence(...):
  fetcher = Fetcher()
  yield fetcher.fetchHomepage()
  firstData = yield fetcher.fetchPage('http://...')
  if someCondition(firstData):
    while True:
      secondData = yield fetcher.fetchPage('http://...')
      # ...
      if someOtherCondition(secondData): break
  else:
    # ...

which is *much* easier to read and debug.  (FWIW, after I put my head
back together, I rewrote the app with threads, and it now looks like the
above, without the yields.  Problem is, throttlling on fetches means 99%
of my threads are blocked on sleep() at any given time, which is just
silly).

All that said, I continue to contend that the microthreading and async
IO operations are separate.  The above could be implemented relatively
easily in Twisted with a variant of the microthreading module Phillip
posted earlier.  It could also be implemented atop a bare-bones
microthreading module with Fetcher using asyncore on the backend, or
even scheduler urllib.urlopen() calls into OS threads.  Presumably, it
could run in NanoThreads and Kamaelia too, among others.

What I want is a consistent syntax for microthreaded code, so that I
could write my function once and run it in *all* of those circumstances.

Dustin

P.S. For the record -- I've written lots of other apps in Twisted with
great success; this one just wasn't a good fit.


More information about the Python-Dev mailing list