[Numpy-discussion] Loading a > GB file into array

Martin Spacek numpy at mspacek.mm.st
Mon Dec 3 15:44:31 EST 2007


Francesc Altet wrote:
> Perhaps something that can surely improve your timings is first 
> performing a read of your data file(s) while throwing the data as you 
> are reading it. This serves only to load the file entirely (if you have 
> memory enough, but this seems your case) in OS page cache. Then, the 
> second time that your code has to read the data, the OS only have to 
> retrieve it from its cache (i.e. in memory) rather than from disk.

I think I tried that, loading the whole file into memory, throwing it 
away, then trying to load on the fly from "disk" (which would now 
hopefully be done more optimally the 2nd time around) while displaying 
the movie, but I still got update times > 5ms. The file's just too big 
to get any improvement by sort of preloading this way.

> You can do this with whatever technique you want, but if you are after 
> reading from a single container and memmap is giving you headaches in 
> 32-bit platforms, you might try PyTables because it allows 64-bit disk 
> addressing transparently, even on 32-bit machines.

PyTables sounds interesting, I might take a look. Thanks.

Martin



More information about the NumPy-Discussion mailing list