[SciPy-User] Maximum file size for .npz format?

Paul Anton Letnes paul.anton.letnes at gmail.com
Fri Mar 12 13:18:18 EST 2010


On 12. mars 2010, at 09.29, Gökhan Sever wrote:

> 
> 
> On Fri, Mar 12, 2010 at 11:22 AM, Paul Anton Letnes <paul.anton.letnes at gmail.com> wrote:
> 
> On 11. mars 2010, at 23.50, Lafras Uys wrote:
> 
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> >>> I need to save a fairly large set of arrays to disk. I have saved it using
> >>> numpy.savez, and the resulting file is around 11Gb (yes, I did say fairly
> >>> large ;D). When I try to load it using numpy.load, the zipfile module
> >>> compains about
> >>> BadZipfile: Bad magic number for file header
> >>>
> >>> I can't open it with the normal zip utility present on the system, but it
> >>> could be that it's barfing about files being larger than 2Gb.
> >>> Is there some file limit for npzs?
> >>
> >> Yes, the ZIP file format has a 4GB limit. Unfortunately, Python does
> >> not yet support the ZIP64 format.
> >>
> >>> Is there anyway I can recover the data (I
> >>> guess I could try decompressing the file with 7z and extracting the
> >>> individual npy files?)
> >>
> >> Possibly. However, if the normal zip utility isn't working, 7z
> >> probably won't, either. Worth a try, though.
> >
> > I've had similar problems, my solution was to move to HDF5. There are
> > two options for accessing and working with HDF files from python: h5py
> > (http://code.google.com/p/h5py/) and pytables
> > (http://www.pytables.org/). Both packages have built in numpy support.
> >
> > Regards,
> > Lafras
> 
> I've experienced similar issues too, but I moved to NetCDF. The only disadvantage was that I did not find any python modules that work well _and_ support numpy. Hence, I am considering moving to HDF5. Which python module would people here recommend? (Or, alternatively, did I miss a great netCDF python module that someone could tell me about?)
> 
> Cheers,
> Paul.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
> 
> There is http://code.google.com/p/netcdf4-python/
> 
> I know netcdf4 is a subset of HDF5. What advantages there to use HDF5 not NetCDF4 ?
> 
> 
> -- 
> Gökhan
> _______________________________________________

I don't know any particular advantages of the file format itself. There are, however, several python modules for hdf5 that use numpy. Your suggestion for a netcdf module might be a good one, but it does not build on my system: it does not find the netcdf library, only the hdf5 lib - even if they reside in the same folder... I'll see if it works out eventually!

-Paul




More information about the SciPy-User mailing list