multiprocessing queue hangs up on the Amazon cloud

Wed Jan 14 17:16:56 EST 2015

On Thu, Jan 15, 2015 at 8:55 AM,  <jgrant at smith.edu> wrote:
> I am trying to run a series of scripts on the Amazon cloud, multiprocessing on the 32 cores of our AWS instance.  The scripts run well, and the queuing seems to work BUT, although the processes run to completion, the script below that runs the queue never ends.  I have tried Queue and JoinableQueue and the same thing happens for both.
>

Hi! I'm not absolutely sure that this is a problem, but I'm seeing
something in the code that you might have to walk me through.

> def worker(done_queue,work_queue):
>         try:
>                 for f in iter(work_queue.get, 'STOP'):

This repeatedly asks the queue for something, until it finds the word STOP.

By the way:

>         except Exception, e:

This is an old form of exception syntax. You may want to use "except
Exception as e:" instead, assuming you don't need to support ancient
Python versions. Carrying on!

>         work_queue = JoinableQueue(810) #change if more than 810 taxa
>         work_queue.put('STOP')

There's a single work queue, with a single word STOP on it.

>         for w in xrange(workers):
>                 p = Process(target=worker, args=(done_queue,work_queue))
>                 p.start()
>                 processes.append(p)
>
>         print "3  it  gets here"
>         for p in processes:
>                 print p                     # it only prints once - <Process(Process-1, started)>
>                 p.join()

And then you seek to run multiple workers. If my reading is correct,
one of them (whichever one happens to get there first) will read the
STOP marker and finish; the others will all be blocked, waiting for
more work (which will never arrive). In theory, you could have the
first process be the one to stop, in which case you'd successfully
join() it and then go on to wait for the second, but if any other
process gets it, you'll be stuck waiting for the first. Or have I
misunderstood something in your logic?

If this is indeed what's happening, the simplest solution might be to
add as many STOPs as you have workers. Alternatively, if you can
guarantee that the work is all on the queue before the first process
starts, you could simply use the empty queue as the sentinel; which I
would recommend doing for the done_queue, as there's only one reader
for that.

But it's entirely possible I've missed some tiny fact that makes my
entire analysis wrong, in which case I apologize for the noise!

ChrisA