[Numpy-discussion] Home for pyhdf5io?

David Warde-Farley dwf at cs.toronto.edu
Sun May 24 18:31:43 EDT 2009


On 24-May-09, at 5:22 PM, Robert Kern wrote:

>> While I haven't tried Andrew Collette's h5py
>> (http://code.google.com/p/h5py), it looks like a very 'thin' wrapper
>> around the HDF5 C libraries. Maybe numpy's save(), savez(), load(),
>> memmap() could be enhanced so that saving/loading files with HDF5- 
>> like
>> file extensions used the HDF5 format, with code based on h5py and
>> pyhdf5io. This could, I imagine, be a relatively small/simple  
>> addition
>> to numpy, with the only external dependency being the HDF5 libraries
>> themselves.
>
> *libhdf5* is too big, not PyTables.

Yup. According to sloccount, numpy is roughly ~210,000 lines of code.  
The hdf5 library is ~385,000 lines. Including even a small part of  
libhdf5 would grow the code base significantly, and requiring it as a  
dependency isn't a good idea since libhdf5 can be tricky to build right.

As Robert's design document for the NPY format says, one option would  
be to implement a minimal subset of the HDF5 protocol *from scratch*  
(that would be required for saving NumPy arrays as top-level leaf  
nodes, for example). This would also sidestep any tricky licensing  
issues (I don't know what the HDF5 license is in particular, I know  
it's fairly permissive but still might not be suitable for including  
any of it in NumPy).

David



More information about the NumPy-Discussion mailing list