[SciPy-User] HDF4, HDF5, netcdf solutions -- PyNIO/PyNGL or CDAT or ??

Francesc Alted faltet at pytables.org
Mon Nov 1 18:02:37 EDT 2010


A Monday 01 November 2010 21:19:02 Robert Kern escrigué:
> On Mon, Nov 1, 2010 at 15:01, Dav Clark <dav at alum.mit.edu> wrote:
> > On Nov 1, 2010, at 5:14 AM, Zachary Pincus wrote:
> >> Where pytables tries to present its own interface, h5py just gives
> >> you the hdf5 file. This means that pytables can do a lot of neat
> >> things (like the indexed searching), but it also means that (at
> >> least last I checked) pytables isn't the best tool for reading in
> >> hdf5 files not created by pytables -- for that, you'd want h5py.
> > 
> > Every time I've had an issue with pytables reading a non-pytables
> > created file, I've submitted a bug and it got fixed usually in a
> > few days. At the time, I was using HDF5 as a transfer layer
> > between matlab's rudimentary hdf5 support and python w/ pytables.
> > (Thanks Francesc!)
> 
> I just wanted to add that in my experience, you can read just about
> any HDF5 file with PyTables except for a few with some more exotic
> features.

Let me chime in just to try to clarify couple of things.  First, both 
PyTables and h5py can read most of the HDF5 files out there, but none of 
them has *complete* support for HDF5 files (implementing complete 
support for the whole HDF5 standard is really a tough task).  In 
addition, the last time that I checked this (about one year ago, so 
things might have changed since then), PyTables can read (and create) 
HDF5 files that h5py cannot; and the contrary is true too.

> If you absolutely need to write an HDF5 file according to a
> strict standard without any extra bits, you may need h5py. However,
> many other readers of your standard probably won't care about the
> extra bits PyTables includes.

I suppose that the 'extra bits' you are referring to are the HDF5 
attributes that complement HDF5 nodes as metainfo.  Let me say that most 
of these attributes are not PyTables-specific, but those used in the 
high-level API of HDF5 (http://www.hdfgroup.org/HDF5/doc/HL/).  Anyway, 
as I said many times, if these attributes are causing some trouble to 
the user (they should not), you can always disable its creation by 
setting the PYTABLES_SYS_ATTRS parameter to false during the opening of 
a file (or, if you like this to be permanent, in the 
tables/parameters.py).  For more info about this, see:

http://www.pytables.org/docs/manual/apc.html#id364726

> You just have to be a little bit
> careful to make sure that you aren't relying on any PyTables
> features, like Python-pickled attributes.

PyTables only uses pickle when trying to save attributes that are not 
supported by HDF5 (with the exception of unicode strings that should be 
implemented soon in PyTables).  For example, if you try to save a list 
as an attribute:

node.attrs.my_attr = [1,2,[3,4]]

as such a list cannot be represented by HDF5 natively, PyTables chooses 
to pickle it and save it.  During retrieval, the pickle is automatically 
detected and unpickled before being returned to the user.  Of course, 
you will not be able to read such attributes with a non-Python 
application.  And, although I consider this like a feature, I can 
understand that this might be considered as a bug by others (but I have 
to say that very few PyTables users, if any at all, has ever complained 
about this 'feature'/'bug').

Hope this helps clarifying some points,

-- 
Francesc Alted



More information about the SciPy-User mailing list