[Numpy-discussion] Loading a > GB file into array

Martin Spacek numpy at mspacek.mm.st
Fri Nov 30 18:09:16 EST 2007


Kurt Smith wrote:
 > You might try numpy.memmap -- others have had success with it for
 > large files (32 bit should be able to handle a 1.3 GB file, AFAIK).

Yeah, I looked into numpy.memmap. Two issues with that. I need to 
eliminate as much disk access as possible while my app is running. I'm 
displaying stimuli on a screen at 200Hz, so I have up to 5ms for each 
movie frame to load before it's too late and it drops a frame. I'm sort 
of faking a realtime OS on windows by setting the process priority 
really high. Disk access in the middle of that causes frames to drop. So 
I need to load the whole file into physical RAM, although it need not be 
contiguous. memmap doesn't do that, it loads on the fly as you index 
into the array, which drops frames, so that doesn't work for me.

The 2nd problem I had with memmap was that I was getting a WindowsError 
related to memory:

 >>> data = np.memmap(1.3GBfname, dtype=np.uint8, mode='r')

Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "C:\bin\Python25\Lib\site-packages\numpy\core\memmap.py", line 
67, in __new__
     mm = mmap.mmap(fid.fileno(), bytes, access=acc)
WindowsError: [Error 8] Not enough storage is available to process this 
command


This was for the same 1.3GB file. This is different from previous memory 
errors I mentioned. I don't get this on ubuntu. I can memmap a file up 
to 2GB on ubuntu no problem, but any larger than that and I get this:

 >>> data = np.memmap(2.1GBfname, dtype=np.uint8, mode='r')

Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/usr/lib/python2.5/site-packages/numpy/core/memmap.py", line 
67, in __new__
     mm = mmap.mmap(fid.fileno(), bytes, access=acc)
OverflowError: cannot fit 'long' into an index-sized integer

The OverflowError is on the bytes argument. If I try doing the mmap.mmap 
directly in Python, I get the same error. So I guess it's due to me 
running 32bit ubuntu.

Martin



More information about the NumPy-Discussion mailing list