[Numpy-discussion] .transpose() of memmap array fails to close()

Glen W. Mabey Glen.Mabey at swri.org
Thu Aug 16 10:17:10 EDT 2007


On Wed, Aug 15, 2007 at 08:50:28PM -0400, Anne Archibald wrote:
> You have to be a bit careful, because a view really is just a view
> into the array - the original is still around. So you can't really
> delete the array contents when the view is deleted. Really, if you do:
> B = A[::2]
> del B
> nothing at all should happen to A.

Okay, right.  I was muddling those two concepts.

> But to be pythonic, or numpythonic, when the original A is
> garbage-collected, the garbage collection should certainly close the
> mmap.

Humm, this would be less than ideal for my use case, when the data on
disk is organized in a different dimensional order than I want to refer
to it in my code.  For example:

p_data = numpy.memmap( datafilename, shape=( 10, 1024, 20 ), dtype=numpy.float32, mode='r')
u_data = p_data.transpose( [ 2, 0, 1 ] )

and I don't want to have to keep track of p_data because its only u_data
that I care about and want to use.  And I promise, this is not a
contrived example.  I have data that I really do want to be ordered in a
certain way on disk, for I/O efficiency reasons, yet when I logically
index into it in my code, I want the dimensions to be in a different
order.

> Being able to apply flush() or whatever to slices is not necessarily
> unpythonic, but it's probably a lot simpler to reliably implement
> slices of mmap()s as simple slices of ordinary arrays. 

I considered this approach, but what happens if you want to instantiate
a slice that is very large, e.g., larger than the size of your physical
RAM?  In that case, you can't afford to make simple slices be ordinary
arrays, besides the case where you want to change values.  Making them
functionally memmap-arrays, but without .sync() and .close() doesn't
seem right either.  

> It means you
> need to keep the original mmap object around (or traverse up the tree
> of bases:
> T = A
> while T.base is not None: T = T.base
> T.flush()
> )
> 
> (Note that this would be simpler if when you did
> A = arange(100)
> B = A[::2]
> C = B[::2]
> you found that C.base were A rather than B.)

Okay, this would make it so that I didn't have to explicitly keep track
of p_data, in my example.  Not bad, although I'd never noticed a .base
member before ...

Thank you,
Glen Mabey



More information about the NumPy-Discussion mailing list