[Numpy-discussion] cannot pickle large numpy objects when memory resources are already stressed

Wed Mar 14 13:17:59 EDT 2007

El dc 14 de 03 del 2007 a les 09:46 -0700, en/na Travis Oliphant va
escriure:
> Glen W. Mabey wrote:
> 
> >Hello,
> >
> >After running a simulation that took 6 days to complete, my script
> >proceeded to attempt to write the results out to a file, pickled.
> >
> >The operation failed even though there was 1G of RAM free (4G machine).  
> >I've since reconsidered using the pickle format for storing data sets 
> >that include large numpy arrays.  However, somehow I assumed that one
> >would be able to pickle anything that you already had in memory, but I
> >see now that this was a rash assumption.
> >
> >Ought there to be a way to do this, or should I forget about being able
> >to bundle large numpy arrays and other objects in a single pickle?

If you can afford using another package for doing I/O perhaps PyTables
can save your day. It is optimized for saving a retrieving very large
amounts of data with ease. In particular, it can save your in-memory
arrays without a need to do another copy in memory (provided the array
is contiguous).  It also allows compressing the data in a transparent
way, without a need of using additional memory.

Furthermore, a recent optimization introduced in the 2.0 branch a week
ago also allows to *update* an array on disk without doing copies
neither.

HTH,

-- 
Francesc Altet    |  Be careful about using the following code --
Carabos Coop. V.  |  I've only proven that it works, 
www.carabos.com   |  I haven't tested it. -- Donald Knuth