[Numpy-discussion] resizeable arrays using shared memory?

Sebastian Berg sebastian at sipsolutions.net
Sat Feb 6 21:01:41 EST 2016


On Sa, 2016-02-06 at 16:56 -0600, Elliot Hallmark wrote:
> Hi all,
> 
> I have a program that uses resize-able arrays.  I already over
> -provision the arrays and use slices, but every now and then the data
> outgrows that array and it needs to be resized.  
> 
> Now, I would like to have these arrays shared between processes
> spawned via multiprocessing (for fast interprocess communication
> purposes, not for parallelizing work on an array).  I don't care
> about mapping to a file on disk, and I don't want disk I/O happening.
>   I don't care (really) about data being copied in memory on resize. 
> I *do* want the array to be resized "in place", so that the child
> processes can still access the arrays from the object they were
> initialized with.
> 
> 
> I can share arrays easily using arrays that are backed by memmap. 
> Ie:
> 
>     ```
>     #Source: http://github.com/rainwoodman/sharedmem 
> 
> 
>     class anonymousmemmap(numpy.memmap):
>         def __new__(subtype, shape, dtype=numpy.uint8, order='C'):
> 
>             descr = numpy.dtype(dtype)
>             _dbytes = descr.itemsize
> 
>             shape = numpy.atleast_1d(shape)
>             size = 1
>             for k in shape:
>                 size *= k
> 
>             bytes = int(size*_dbytes)
> 
>             if bytes > 0:
>                 mm = mmap.mmap(-1,bytes)
>             else:
>                 mm = numpy.empty(0, dtype=descr)
>             self = numpy.ndarray.__new__(subtype, shape, dtype=descr,
> buffer=mm, order=order)
>             self._mmap = mm
>             return self
>             
>         def __array_wrap__(self, outarr, context=None):
>             return
> numpy.ndarray.__array_wrap__(self.view(numpy.ndarray), outarr,
> context)
>     ```
> 
> This cannot be resized because it does not own it's own data
> (ValueError: cannot resize this array: it does not own its data). 
> (numpy.memmap has this same issue [0], even if I set refcheck to
> False and even though the docs say otherwise [1]). 
> 
> arr._mmap.resize(x) fails because it is annonymous (error: [Errno 9]
> Bad file descriptor).  If I create a file and use that fileno to
> create the memmap, then I can resize `arr._mmap` but the array itself
> is not resized.
> 
> Is there a way to accomplish what I want?  Or, do I just need to
> figure out a way to communicate new arrays to the child processes?
> 

I guess the answer is no, but the first question should be whether you
can create a new array viewing the same data that is just larger? Since
you have the mmap, that would be creating a new view into it.

I.e. your "array" would be the memmap, and to use it, you always rewrap
it into a new numpy array.

Other then that, you would have to mess with the internal ndarray
structure, since these kind of operations appear rather unsafe.

- Sebastian


> Thanks,
>   Elliot
> 
> [0] https://github.com/numpy/numpy/issues/4198.
> 
> [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.
> resize.html
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160207/9b6cac8c/attachment.sig>


More information about the NumPy-Discussion mailing list