[SciPy-User] HDF4, HDF5, netcdf solutions -- PyNIO/PyNGL or CDAT or ??

Andrew Collette andrew.collette at gmail.com
Mon Nov 1 23:40:47 EDT 2010


Hi everyone,

I'm the author of h5py (although I haven't posted here in a while).

> I think the above is a good way to express the main difference between
> h5py and PyTables.  But, unfortunately, many wrong beliefs about
> packages that are similar in functionality extend in Internet without a
> solid reason behind them.  I suppose this is a consequence of the
> propagation of information in multi-user channels.  Unfortunately,
> fighting these myths is not always easy.

I think this is partially my fault for not making h5py's purpose
clearer in the beginning.  From my perspective, h5py is trying to be a
"native" (as close as possible) Python/NumPy interface to the HDF5
library, while adding as little as possible.  That means it doesn't
have any of the advanced indexing features of PyTables, or the
database metaphor (Francesc, reel me in if I'm getting out of bounds
here or below).  There are also some types which are unsupported, like
the NumPy unicode type, because I couldn't think of a way to map them
correctly.  You can find a complete list of supported/unsupported
types in the h5py FAQ
(http://code.google.com/p/h5py/wiki/FAQ#What_datatypes_are_supported?).

However, h5py provides a number of nifty things, including support for
object and region references, automatic exception translation between
HDF5 and Python (i.e. HDF5 itself can raise IOError, etc.), thread
support, and a very broad *low-level* interface to HDF5, in addition
to the NumPy-like high level interface:

http://h5py.alfven.org/docs/api/index.html

This interface is mainly of interest if you're an HDF5 weenie, or have
very, very, very specific requirements for how to write your files.
It's also the foundation on which the friendlier high-level interface
is built.

As far as compatibility, I would be very surprised if PyTables files
are much "better" or "worse" than h5py files.  Generally that sort of
thing is due to changes in HDF5 itself, for example going from HDF5
1.6 to 1.8, or the various knobs, features and anti-features in the
various releases.  The attributes thing is also a bit of a red
herring, although to toot my own horn I should point out that one of
the explicit design goals for h5py is to never touch "user-owned
spaces" like attributes or group entries.  I can't imagine it having a
practical effect.  In any case, as Francesc points out, PyTables lets
you control what gets written, or turn it off completely.

> Frankly, I think the best rational here is more a matter of trying out
> the different packages and choose the one you like the most.  This is
> one of the beauties of free software: easy trying.

Well said!  I should also point out that Francesc and I have shared
code and suggestions in the past, which is another great thing about
free software.  In fact, h5py started off using the PyTables Pyrex
definitions!  It certainly saved me lots of typing. :)

Andrew



More information about the SciPy-User mailing list