[Numpy-discussion] About the npz format

Valentin Haenel valentin at haenel.co
Fri Apr 18 12:29:27 EDT 2014


Hi,

* Valentin Haenel <valentin at haenel.co> [2014-04-17]:
> * Valentin Haenel <valentin at haenel.co> [2014-04-17]:
> > * Julian Taylor <jtaylor.debian at googlemail.com> [2014-04-17]:
> > > On 17.04.2014 21:30, onefire wrote:
> > > > Thanks for the suggestion. I did profile the program before, just not
> > > > using Python.
> > > 
> > > one problem of npz is that the zipfile module does not support streaming
> > > data in (or if it does now we aren't using it).
> > > So numpy writes the file uncompressed to disk and then zips it which is
> > > horrible for performance and disk usage.
> > 
> > As a workaround may also be possible to write the temporary NPY files to
> > cStringIO instances and then use ``ZipFile.writestr`` with the
> > ``getvalue()`` of the cStringIO object. However that approach may
> > require some memory. In python 2.7, for each array: one copy inside the
> > cStringIO instance and then another copy of when calling getvalue on the
> > cString, I believe.
> 
> There is a proof-of-concept implementation here:
> 
> https://github.com/esc/numpy/compare/feature;npz_no_temp_file

Anybody interested in me fixing this up (unit tests, API, etc..) for
inclusion?

V-



More information about the NumPy-Discussion mailing list