[Numpy-discussion] Process-shared memory allocation per default?

Mon Oct 31 06:22:29 EDT 2011

This comes out of a long discussion on the Cython list. Following Mark's 
success with the shared memory parallelism, the question is: Where to 
take Cython's capabilities for parallelism further?

One thing that's been up now and then is that we could basically use 
something like:

  - multiprocessing (to get rid of any GIL issues)

  - allocate all NumPy arrays in process-shared memory; passing NumPy 
arrays between processes happens by "picling views".

This can be done with current NumPy by using a seperate constructor, e.g.,

a = sharedmem_zeros((3, 3))

However, construction of the array feels like the wrong place to make 
this decision. It is really when it is sent to another process the 
decision should be made. If all NumPy arrays are allocated in shared 
memory per default, one could do

shared_queue.put(a.shared())

and shared() would wrap a in something that pickled a shared memory 
pointer rather than the data (and unpickled directly to the NumPy array).

I just find this *a lot* more convenient than the tedious business of 
making sure the memory is allocated in the right way everywhere. Any 
downsides to doing this? (Additional overhead for small arrays perhaps?)

  - On the Cython end, parallelism could then both be supported by 
low-level message passing using ZeroMQ (possibly with syntax candy for 
sending typed messages), or with another multiprocessing backend to the 
current prange which requires that any memoryviews worked on are 
allocated in shared memory.

I'm just looking for feedback here. I don't have cycles in terms of 
implementation; the point is that what NumPy users and devs are thinking 
about this could direct the further discussion of parallelism within Cython.

Dag Sverre