[SciPy-user] Fast saving/loading of huge matrices

Francesc Altet faltet at carabos.com
Thu Apr 19 15:42:32 EDT 2007


El dj 19 de 04 del 2007 a les 14:19 -0500, en/na Ryan Krauss va
escriure:
> I just changed from simply reading a text file using io.read_array to
> cPickle and got a factor of 4 or 5 speed up for my medium sized array.
>  But the cPickle file is quite large (about twice the size of the
> ascii file - I don't think the ascii has very many digits).

Yeah. This can be expected because pickle saves the complete set of
digits in binary form (8 bytes for double precisition, while if you keep
only 2 digits (+ the decimal point + a space) you will need only 4 bytes
for your data, hence the space savings.

> I thought there used to be some built in functions called something
> like shelve that stored dictionaries fairly quickly and compactly.
> Are those functions still around and I am just remembering the name
> wrong?  Or have they been done away with?  I remember vaguely that
> they stored data in 3 seperate files - a python file that could later
> be imported, a dat file  (I think) and something else.
> 
> The cPickle approach seems fast, I just wish there was some way to
> make the files smaller.  Is there a good way to do this that doesn't
> slow down the read time too much?

Try using compression. If your data doesn't have many decimals, chances
are that it can be easily compressed up to 3x.  There are many
compressors that have a Python interface (your best bet is to use the
zlib module included in Python). Or try PyTables for transparent
compression support.

HTH,

-- 
Francesc Altet    |  Be careful about using the following code --
Carabos Coop. V.  |  I've only proven that it works, 
www.carabos.com   |  I haven't tested it. -- Donald Knuth




More information about the SciPy-User mailing list