[Numpy-discussion] About the npz format

Valentin Haenel valentin at haenel.co
Thu Apr 17 16:26:35 EDT 2014


Hi,

* Julian Taylor <jtaylor.debian at googlemail.com> [2014-04-17]:
> On 17.04.2014 21:30, onefire wrote:
> > Hi Nathaniel,
> > 
> > Thanks for the suggestion. I did profile the program before, just not
> > using Python.
> 
> one problem of npz is that the zipfile module does not support streaming
> data in (or if it does now we aren't using it).
> So numpy writes the file uncompressed to disk and then zips it which is
> horrible for performance and disk usage.

As a workaround may also be possible to write the temporary NPY files to
cStringIO instances and then use ``ZipFile.writestr`` with the
``getvalue()`` of the cStringIO object. However that approach may
require some memory. In python 2.7, for each array: one copy inside the
cStringIO instance and then another copy of when calling getvalue on the
cString, I believe.

best,

V-



More information about the NumPy-Discussion mailing list