[SciPy-dev] Memory mapped files in scipy core

Travis Oliphant oliphant at ee.byu.edu
Sun Nov 20 04:03:05 EST 2005


I would appreciate understanding typically use cases for memory-mapped 
files.   I am not sure I understand why certain choices were made for 
numarray's memmap and memmap slice classes.  They seem like a lot of 
"extra" stuff and I'm not sure what all that stuff is for.

Rather than just copy these over, I would like to understand what people 
typically want to do with memory-mapped files to see if scipy core 
doesn't already provide that.

For example, write now I can open a file, use mmap to obtain a memory 
map object and then pass that object into frombuffer in scipy_core to 
get an ndarray whose memory maps a file on disk. 

Now, this ndarray can be sliced and indexed and manipulated all the 
while referring to the file on disk (well technically, I suppose, the 
memory-mapped object would need to be flushed to synchronize). 

Now, I could see wanting to make the process of opening the file, 
getting the mmap object and setting it's buffer to the array object a 
little easier.  Thus, a simple memmap class would be a useful construct 
-- I could even see it inheriting from the ndarray directly and adding a 
few methods.   I guess I just don't see why one would care about a 
memory-mapped slice object, when the mmaparray sub-class would be 
perfectly useful.


On a related, but orthogonal note:

My understanding is that using memory-mapped files for *very* large 
files will require modification to the mmap module in Python --- 
something I think we should push.  One part of that process would be to 
add the C-struct array interface to the mmap module and the buffer 
object -- perhaps this is how we get the array interface into Python 
quickly.   Then, if we could make a base-type mmap that did not use the 
buffer interface or the sequence interface (similar to the bigndarray in 
scipy_core) and therefore by-passed the problems with Python in those 
areas, then the current mmap object could inherit from the base class 
and provide current functionality while still exposing the array 
interface for access to >2GB files on 64-bit systems.

Who would like to take up the ball for modifying mmap in Python in this 
fashion?

-Travis




More information about the SciPy-Dev mailing list