parallel processing in standard library

Fri Dec 28 10:23:06 EST 2007

On Dec 27, 2007 4:13 PM, Robert Kern <robert.kern at gmail.com> wrote:

> Emin.shopper Martinian.shopper wrote:
> > If not, is there any hope of something like
> > the db-api for coarse grained parallelism (i.e, a common API that
> > different toolkits can support)?
>
> The problem is that for SQL databases, there is a substantial API that
> they can
> all share. The implementations are primarily differentiated by other
> factors
> like speed, in-memory or on-disk, embedded or server, the flavor of SQL,
> etc.
> and only secondarily differentiated by their extensions to the DB-API.
> With
> parallel processing, the API itself is a key differentiator between
> toolkits and
> approaches. Different problems require different APIs, not just different
> implementations.

I disagree. Most of the implementations of coarse-grained parallelism I have
seen and used share many features. For example, they generally have a notion
of spawning processes/tasks, scheduling/load-balancing, checking tasks on a
server, sending messages to/from tasks, detecting when tasks finish or die,
logging the results for debugging purposes, etc. Sure they all do these
things in slightly different ways, but for coarse-grained parallelism the
API difference rarely matter (although the implementation differences can
matter).

I suspect that one of the smaller implementations like processing.py might
> get
> adopted into the standard library if the author decides to push for it.

That would be great.

My recommendation to you is to pick one of the smaller implementations that
> solves the problems in front of you. Read and understand that module so
> you
> could maintain it yourself if you had to. Post to this list about how you
> use
> it. Blog about it if you blog. Write some Python Cookbook recipes to show
> how
> you solve problems with it.

That is a good suggestion, but for most of the coarse grained parallelism
tasks I've worked on it would be easier to roll my own system than do that.
To put it another way, why spend the effort to use a particular API if I
don't know its going to be around for a while? Since a lot of the value is
in the API as opposed to the implementation, unless there is something
special about the API (e.g., it is an official or at least de factor
standard) the learning curve may not be worth it.

> If there is a lively community around it, that will
> help it get into the standard library. Things get into the standard
> library
> *because* they are supported, not the other way around.

You make a good point and in general I would agree with you. Isn't it
possible, however, that there are cases where inclusion in the standard
library would build a better community? I think this is the argument for
many types of standards. A good example is wireless networking. The
development of a standard like 802.11 provided hardware manufacturers the
incentive to build devices that could communicate with each other and that
made people want to buy the products.

Still, I take your basic point to heart: if I want a good API, I should get
off my but and contribute to it somehow.

How would you or the rest of the community react to a proposal for a generic
parallelism API? I suspect the response would be "show us an implementation
of the code". I could whip up an implementation or adapt one of the existing
systems, but then I worry that the discussion would devolve into an argument
about the pros and cons of the particular implementation instead of the API.
Even worse, it might devolve into an argument of the value of fine-grained
vs. coarse-grained parallelism or the GIL. Considering that these issues
seem to have been discussed quite a bit already and there are already
multiple parallel processing implementations, it seems like the way forward
lies in either a blessing of a particular package that already exists or
adoption of an API instead of a particular implementation.

Thanks for your thoughts,
-Emin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20071228/f385cab8/attachment.html>