[Python-Dev] Pythonic concurrency

Thu Sep 29 21:31:19 CEST 2005

Bruce Eckel wrote:
> I'd like to restart this discussion; I didn't mean to put forth active
> objects as "the" solution, only that it seems to be one of the better,
> more OO solutions that I've seen so far.
> 
> What I'd really like to figure out is the "pythonic" solution for
> concurrency. Guido and I got as far as agreeing that it wasn't
> threads.

I've pondered this problem.  Python deals programmers a double whammy 
when it comes to threads: not only is threading unsafe like it is in 
other languages, but the GIL also prevents you from using multiple 
processors.  Thus there's more pressure to improve concurrency in Python 
than there is elsewhere.

I like to use fork(), but fork has its own set of surprises.  In 
particular, in the programmer's view, forking creates a disassociated 
copy of every object except files.  Also, there's no Pythonic way for 
the two processes to communicate once the child has started.

It's tempting to create a library around fork() that solves the 
communication problem, but the copied objects are still a major source 
of bugs.  Imagine what would happen if you forked a Zope process with an 
open ZODB.  If both the parent and child change ZODB objects, ZODB is 
likely to corrupt itself, since the processes share file descriptors. 
Thus forking can just as dangerous as threading.

Therefore, I think a better Python concurrency model would be a lot like 
the subprocess module, but designed for calling Python code.  I can 
already think of several ways I would use such a module.  Something like 
the following would solve problems I've encountered with threads, 
forking, and the subprocess module:

     import pyprocess
     proc = pyprocess.start('mypackage.mymodule', 'myfunc', arg1, arg2=5)
     while proc.running():
         # do something else
     res = proc.result()

This code doesn't specify whether the subprocess should continue to 
exist after the function completes (or throws an exception).  I can 
think of two ways to deal with that:

1) Provide two APIs.  The first API stops the subprocess upon function 
completion.  The second API allows the parent to call other functions in 
the subprocess, but never more than one function at a time.

2) Always leave subprocesses running, but use a 'with' statement to 
guarantee the subprocess will be closed quickly.  I prefer this option.

I think my suggestion fits most of your objectives.

> 1) It works by default, so that novices can use it without falling
> into the deep well of threading. That is, a program that you write
> using threading is broken by default, and the tool you have to fix it
> is "inspection." I want something that allows me to say "this is a
> task. Go." and have it work without the python programmer having to
> study and understand several tomes on the subject.

Done, IMHO.

> 2) Tasks can be automatically distributed among processors, so it
> solves the problems of (a) making python run faster (b) how to utilize
> multiprocessor systems.

Done.  The OS automatically maps subprocesses to other processors.

> 3) Tasks are cheap enough that I can make thousands of them, to solve
> modeling problems (in which I also lump games). This is really a
> solution to a cerain type of program complexity -- if I can just
> assign a task to each logical modeling unit, it makes such a system
> much easier to program.

Perhaps the suggested module should have a queue-oriented API.  Usage 
would look like this:

     import pyprocess
     queue = pyprocess.ProcessQueue(max_processes=4)
     task = queue.put('mypackage.mymodule', 'myfunc', arg1, arg2=5)

Then, you can create as many tasks as you like; parallelism will be 
limited to 4 concurrent tasks.  A variation of ProcessQueue might manage 
the concurrency limit automatically.

> 4) Tasks are "self-guarding," so they prevent other tasks from
> interfering with them. The only way tasks can communicate with each
> other is through some kind of formal mechanism (something queue-ish,
> I'd imagine).

Done.  Subprocesses have their own Python namespace.  Subprocesses 
receive messages through function calls and send messages by returning 
from functions.

> 5) Deadlock is prevented by default. I suspect livelock could still
> happen; I don't know if it's possible to eliminate that.

No locking is done at all.  (That makes me uneasy, though; have I just 
moved locking problems to the application developer?)

> 6) It's natural to make an object that is actor-ish. That is, this
> concurrency approach works intuitively with objects.

Anything pickleable is legal.

> 7) Complexity should be eliminated as much as possible. If it requires
> greater limitations on what you can do in exchange for a clear,
> simple, and safe programming model, that sounds pythonic to me. The
> way I see it, if we can't easily use tasks without getting into
> trouble, people won't use them. But if we have a model that allows
> people to (for example) make easy use of multiple processors, they
> will use that approach and the (possible) extra overhead that you pay
> for the simplicity will be absorbed by the extra CPUs.

I think the solution is very simple.

> 8) It should not exclude the possibility of mobile tasks/active
> objects, ideally with something relatively straightforward such as
> Linda-style tuple spaces.

The proposed module could serve as a guide for a very similar module 
that sends tasks to other machines.

Shane