[Numpy-discussion] Huge arrays

Wed Sep 9 05:55:07 EDT 2009

A Wednesday 09 September 2009 10:48:48 Francesc Alted escrigué:
> OTOH, having the possibility to manage compressed data buffers
> transparently in NumPy would help here, but not there yet ;-)

Now that I think about it, in case the data is compressible, Daniel could try 
to define a PyTables' compressed array or table on-disk and save chunks to it.
If data is compressible enough, the filesystem cache will keep it in-memory, 
until the disk can eventually absorb it.

For doing this, I would recommend to use the LZO compressor, as it is one of 
the fastest I've seen (at least until Blosc would be ready), because it can 
compress up to 5 times faster than output data to disk (depending on how 
compressible the data is, and the speed of the disk subsystem).

Of course, if data is not compressible at all, then this venue doesn't make a 
lot of sense.

HTH,

-- 
Francesc Alted