[Numpy-discussion] Home for pyhdf5io?
David Warde-Farley
dwf at cs.toronto.edu
Sun May 24 18:31:43 EDT 2009
On 24-May-09, at 5:22 PM, Robert Kern wrote:
>> While I haven't tried Andrew Collette's h5py
>> (http://code.google.com/p/h5py), it looks like a very 'thin' wrapper
>> around the HDF5 C libraries. Maybe numpy's save(), savez(), load(),
>> memmap() could be enhanced so that saving/loading files with HDF5-
>> like
>> file extensions used the HDF5 format, with code based on h5py and
>> pyhdf5io. This could, I imagine, be a relatively small/simple
>> addition
>> to numpy, with the only external dependency being the HDF5 libraries
>> themselves.
>
> *libhdf5* is too big, not PyTables.
Yup. According to sloccount, numpy is roughly ~210,000 lines of code.
The hdf5 library is ~385,000 lines. Including even a small part of
libhdf5 would grow the code base significantly, and requiring it as a
dependency isn't a good idea since libhdf5 can be tricky to build right.
As Robert's design document for the NPY format says, one option would
be to implement a minimal subset of the HDF5 protocol *from scratch*
(that would be required for saving NumPy arrays as top-level leaf
nodes, for example). This would also sidestep any tricky licensing
issues (I don't know what the HDF5 license is in particular, I know
it's fairly permissive but still might not be suitable for including
any of it in NumPy).
David
More information about the NumPy-Discussion
mailing list