Ending data exchange through multiprocessing pipe

Paul Boddie paul at boddie.org.uk
Thu Apr 23 09:02:36 EDT 2009


On 22 Apr, 17:43, Michal Chruszcz <mchrus... at gmail.com> wrote:
>
> I am adding support for parallel processing to an existing program
> which fetches some data and then performs some computation with
> results saved to a database. Everything went just fine until I wanted
> to gather all of the results from the subprocesses.

[Queue example]

I have to say that I'm not familiar with the multiprocessing API, but
for this kind of thing, there needs to be some reasonably complicated
stuff happening in the background to test for readable conditions on
the underlying pipes or sockets. In the pprocess module [1], I had to
implement a poll-based framework (probably quite similar to Twisted
and asyncore) to avoid deadlocks and other undesirable conditions.

[Pipe example]

Again, it's really awkward to monitor pipes between processes and to
have them "go away" when closed. Indeed, I found that you don't really
want them to disappear before everyone has finished reading from them,
but Linux (at least) tends to break pipes quite readily. I got round
this problem by having acknowledgements in pprocess, but it felt like
a hack.

> Most possibly I'm missing something in philosophy of multiprocessing,
> but I couldn't find anything covering such a situation. I'd appreciate
> any kind of hint on this topic, as it became a riddle I just have to
> solve. :-)

The multiprocessing module appears to offer map-based conveniences
(Pool objects) where you indicate that you want the same callable
executed multiple times and the results to be returned, so perhaps
this is really what you want. In pprocess, there's a certain amount of
flexibility exposed in the API, so that you can choose to use a map-
like function, or you can open everything up and use the
communications primitives directly (which would appear to be similar
to the queue-oriented programming mentioned in the multiprocessing
documentation).

One thing that pprocess exposes (and which must be there in some form
in the multiprocessing module) is the notion of an "exchange" which is
something that monitors a number of communications channels between
processes and is able to detect and act upon readable channels in an
efficient way. If it's not the Pool class in multiprocessing that
supports such things, I'd take a look for the component which does
support them, if I were you, because this seems to be the
functionality you need.

Paul

[1] http://www.boddie.org.uk/python/pprocess.html



More information about the Python-list mailing list