Pickle based workflow - looking for advice

Chris Angelico rosuav at gmail.com
Tue Apr 14 10:30:01 EDT 2015


On Wed, Apr 15, 2015 at 12:14 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Tue, 14 Apr 2015 11:45 pm, Chris Angelico wrote:
>
>> On Tue, Apr 14, 2015 at 11:08 PM, Steven D'Aprano
>> <steve+comp.lang.python at pearwood.info> wrote:
>>> On Tue, 14 Apr 2015 05:58 pm, Fabien wrote:
>>>
>>>> On 14.04.2015 06:05, Chris Angelico wrote:
>>>>> Not sure what you mean, here. Any given file will be written by
>>>>> exactly one process? No possible problem. Multiprocessing within one
>>>>> application doesn't change that.
>>>>
>>>> yes that's what I meant. Thanks!
>>>
>>> It's not that simple though. If you require files to be written in
>>> precisely a certain order, then parallel processing requires
>>> synchronisation.
>>>
>>> Suppose you write A, then B, then C, then D, each in it's own process (or
>>> thread). So the B process has to wait for A to finish, the C process has
>>> to wait for B to finish, and so on. Otherwise you could find yourself
>>> with C reading the data from B before B is finished writing it.
>>
>> Sure, which is a matter of writer/reader conflicts on a single file -
>> nothing to do with "writing multiple files simultaneously" which was
>> the question raised.
>
> Fabien: "So I'm trying to crack open an old grenade I found, and I was
> wondering if I need a ball-peen hammer or whether a regular hammer will be
> okay."
>
> You: "Oh, a regular hammer will be fine."
>
> Me: "Just a minute. You're hitting a grenade with a hammer hard enough to
> crack the case. That could be bad. It might explode."
>
> You: "Sure, but the OP never asked about that. He just asked if the kind of
> hammer makes a difference."
>
> :-P

Heh, fair point. This list is superb at answering the questions people
never even knew to ask.

> Seriously though, the OP did specify in his first post that there is at
> least one dependency of the "B depends on A finishing first" kind. I
> understood that A writes to a file, B reads that file and writes to a new
> file, C reads that file and writes to yet another file, and so on. In which
> case, *writing* the files is the least of his problems, it's the exploding
> grenade, er, synchronisation problems that will get him.
>
> :-)
>
> Apart from "embarrassingly parallel" problems, thread- and
> multiprocessing-based workflows are often trickier than they may seen ahead
> of time, and may even be slower than a purely sequential algorithm:

Yep. The way I read the OP's problem, it's easiest thought of as a
generic request-response system - same as most internet servers.
Basically, you have a piece of code that reacts to an incoming
request, and produces some form of response, then goes back and looks
for the next request. Whether you actually code along those lines or
not, it's a reasonable way to get your head around it.

If you _do_ code it that way, one big benefit is that you effectively
have a multiprocessable state machine; you can fork out to N processes
to take advantage of your CPU cores, or run in a single process for
debugging, and none of the code cares in the slightest.

ChrisA



More information about the Python-list mailing list