[Numpy-discussion] Process-shared memory allocation per default?
Dag Sverre Seljebotn
d.s.seljebotn at astro.uio.no
Mon Oct 31 06:22:29 EDT 2011
This comes out of a long discussion on the Cython list. Following Mark's
success with the shared memory parallelism, the question is: Where to
take Cython's capabilities for parallelism further?
One thing that's been up now and then is that we could basically use
something like:
- multiprocessing (to get rid of any GIL issues)
- allocate all NumPy arrays in process-shared memory; passing NumPy
arrays between processes happens by "picling views".
This can be done with current NumPy by using a seperate constructor, e.g.,
a = sharedmem_zeros((3, 3))
However, construction of the array feels like the wrong place to make
this decision. It is really when it is sent to another process the
decision should be made. If all NumPy arrays are allocated in shared
memory per default, one could do
shared_queue.put(a.shared())
and shared() would wrap a in something that pickled a shared memory
pointer rather than the data (and unpickled directly to the NumPy array).
I just find this *a lot* more convenient than the tedious business of
making sure the memory is allocated in the right way everywhere. Any
downsides to doing this? (Additional overhead for small arrays perhaps?)
- On the Cython end, parallelism could then both be supported by
low-level message passing using ZeroMQ (possibly with syntax candy for
sending typed messages), or with another multiprocessing backend to the
current prange which requires that any memoryviews worked on are
allocated in shared memory.
I'm just looking for feedback here. I don't have cycles in terms of
implementation; the point is that what NumPy users and devs are thinking
about this could direct the further discussion of parallelism within Cython.
Dag Sverre
More information about the NumPy-Discussion
mailing list