[Numpy-discussion] Loading a > GB file into array
Martin Spacek
numpy at mspacek.mm.st
Fri Nov 30 18:09:16 EST 2007
Kurt Smith wrote:
> You might try numpy.memmap -- others have had success with it for
> large files (32 bit should be able to handle a 1.3 GB file, AFAIK).
Yeah, I looked into numpy.memmap. Two issues with that. I need to
eliminate as much disk access as possible while my app is running. I'm
displaying stimuli on a screen at 200Hz, so I have up to 5ms for each
movie frame to load before it's too late and it drops a frame. I'm sort
of faking a realtime OS on windows by setting the process priority
really high. Disk access in the middle of that causes frames to drop. So
I need to load the whole file into physical RAM, although it need not be
contiguous. memmap doesn't do that, it loads on the fly as you index
into the array, which drops frames, so that doesn't work for me.
The 2nd problem I had with memmap was that I was getting a WindowsError
related to memory:
>>> data = np.memmap(1.3GBfname, dtype=np.uint8, mode='r')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\bin\Python25\Lib\site-packages\numpy\core\memmap.py", line
67, in __new__
mm = mmap.mmap(fid.fileno(), bytes, access=acc)
WindowsError: [Error 8] Not enough storage is available to process this
command
This was for the same 1.3GB file. This is different from previous memory
errors I mentioned. I don't get this on ubuntu. I can memmap a file up
to 2GB on ubuntu no problem, but any larger than that and I get this:
>>> data = np.memmap(2.1GBfname, dtype=np.uint8, mode='r')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/site-packages/numpy/core/memmap.py", line
67, in __new__
mm = mmap.mmap(fid.fileno(), bytes, access=acc)
OverflowError: cannot fit 'long' into an index-sized integer
The OverflowError is on the bytes argument. If I try doing the mmap.mmap
directly in Python, I get the same error. So I guess it's due to me
running 32bit ubuntu.
Martin
More information about the NumPy-Discussion
mailing list