Traceback when using multiprocessing, less than helpful?

Thu Nov 21 12:01:40 EST 2013

Hi folks,

Somewhat over a year ago, I struggled with implementing a routine using multiprocessing.Pool and numpy.  I eventually succeeded, but I remember finding it very hard to debug.  Now I have managed to provoke an error from that routine again, and once again, I'm struggling.

Here is the end of the traceback, starting with the last line of my code: "result = pool.map(evaluate, bundles)".  After that, I'm into Python itself.

  File ".../evaluate.py", line 81, in evaluate
    result = pool.map(evaluate, bundles)
  File "/usr/lib/python3.3/multiprocessing/pool.py", line 228, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib/python3.3/multiprocessing/pool.py", line 564, in get
    raise self._value
ValueError: operands could not be broadcast together with shapes (1,3) (4)

Notice that no line of numpy appears in the traceback?  Still, there are three things that make me think that this error is coming from numpy.

1. "raise self._value" means that an exception is stored in a variable, to be re-raised.

2. The words "operands" and "broadcast" do not appear anywhere in the source code of multiprocessing.pool.

3. The words "operands" and "broadcast" are common to numpy errors I have seen before.  Numpy does many very tricky things when dealing with arrays of different dimensions and shapes.

Of course, I am sure that the bug must be in my own code.  I even have old programs which are using my evaluate.evaluate() without generating errors.  I am comparing the data structures that my working and my non-working programs send to pool.map().  I am comparing the code between my two programs.  There is some subtle difference that I haven't spotted.

If I could only see the line of numpy code which is generating the ValueError, I would have a better chance of spotting the bug in my code.  So, WHY isn't there any reference to numpy in my traceback?

Here's my theory.  The numpy error was generated in a subprocess.  The line "raise self._value" is intercepting the exception generated by my subprocess, and passing it back to the master Python interpreter.

Does re-raising an exception, and/or passing an exception from a subprocess, truncate a traceback?  That's what I think I'm seeing.

Thanks for any advice!