[Numpy-discussion] About the npz format

Fri Apr 18 07:01:09 EDT 2014

Hi again,

* onefire <onefire.myself at gmail.com> [2014-04-18]:
> I think your workaround might help, but a better solution would be to not
> use Python's zipfile module at all. This would make it possible to, say,
> let the user choose the checksum algorithm or to turn that off.
> Or maybe the compression stuff makes this route too complicated to be worth
> the trouble? (after all, the zip format is not that hard to understand)

Just to give you an idea of what my aforementioned Bloscpack library can
do in the case of linspace:

In [1]: import numpy as np

In [2]: import bloscpack as bp

In [3]: import bloscpack.sysutil as bps

In [4]: x = np.linspace(1, 10, 50000000)

In [5]: %timeit np.save("x.npy", x) ; bps.sync()
1 loops, best of 3: 2.12 s per loop

In [6]: %timeit bp.pack_ndarray_file(x, 'x.blp') ; bps.sync()
1 loops, best of 3: 627 ms per loop

In [7]: %timeit -n 3 -r 3 np.save("x.npy", x) ; bps.sync()
3 loops, best of 3: 1.92 s per loop

In [8]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x.blp') ; bps.sync()
3 loops, best of 3: 564 ms per loop

In [9]: ls -lah x.npy x.blp
-rw-r--r-- 1 root root  49M Apr 18 12:53 x.blp
-rw-r--r-- 1 root root 382M Apr 18 12:52 x.npy

However, this is a bit of special case, since Blosc does extremely well
-- both speed and size wise -- on the linspace data, your milage may
vary.

best,

V-