[Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

Allan Haldane allanhaldane at gmail.com
Wed May 11 14:01:02 EDT 2016


On 05/11/2016 04:29 AM, Sturla Molden wrote:
> 4. The reason IPC appears expensive with NumPy is because multiprocessing
> pickles the arrays. It is pickle that is slow, not the IPC. Some would say
> that the pickle overhead is an integral part of the IPC ovearhead, but i
> will argue that it is not. The slowness of pickle is a separate problem
> alltogether.

That's interesting. I've also used multiprocessing with numpy and didn't
realize that. Is this true in python3 too?

In python2 it appears that multiprocessing uses pickle protocol 0 which
must cause a big slowdown (a factor of 100) relative to protocol 2, and
uses pickle instead of cPickle.

a = np.arange(40*40)

%timeit pickle.dumps(a)
1000 loops, best of 3: 1.63 ms per loop

%timeit cPickle.dumps(a)
1000 loops, best of 3: 1.56 ms per loop

%timeit cPickle.dumps(a, protocol=2)
100000 loops, best of 3: 18.9 µs per loop

Python 3 uses protocol 3 by default:

%timeit pickle.dumps(a)
10000 loops, best of 3: 20 µs per loop


> 5. Share memory does not improve on the pickle overhead because also NumPy
> arrays with shared memory must be pickled. Multiprocessing can bypass
> pickling the RawArray object, but the rest of the NumPy array is pickled.
> Using shared memory arrays have no speed advantage over normal NumPy arrays
> when we use multiprocessing.
> 
> 6. It is much easier to write concurrent code that uses queues for message
> passing than anything else. That is why using a Queue object has been the
> popular Pythonic approach to both multitreading and multiprocessing. I
> would like this to continue.
> 
> I am therefore focusing my effort on the multiprocessing.Queue object. If
> you understand the six points I listed you will see where this is going:
> What we really need is a specialized queue that has knowledge about NumPy
> arrays and can bypass pickle. I am therefore focusing my efforts on
> creating a NumPy aware queue object.
> 
> We are not doing the users a favor by encouraging the use of shared memory
> arrays. They help with nothing.
> 
> 
> Sturla Molden





More information about the NumPy-Discussion mailing list