[SciPy-user] HDF5 vs FITS (was: Fast saving/loading of huge matrices)

Francesc Altet faltet at carabos.com
Sun Apr 22 17:08:42 EDT 2007


El dg 22 de 04 del 2007 a les 10:23 -0400, en/na Perry Greenfield va
escriure:
> On Apr 22, 2007, at 6:02 AM, Francesc Altet wrote:
> 
> > Hi Perry,
> >
> > El dv 20 de 04 del 2007 a les 17:14 -0400, en/na Perry Greenfield va
> > escriure:
> >>
> >> I think that is a bit too broadly posed to answer in any simple way
> >> (if you are wondering how HDF and FITS compare). Speed? Flexibility?
> >> Etc. FITS is generally much less flexible. However, it is archival.
> >> Something that HDF has a harder time claiming. And it is very well
> >> entrenched in astronomy.
> >
> > Sorry for my ignorance, but can you explain what 'archival' term means
> > in this context?  I suppose that it has a very concrete meaning, but I
> > can't realize why a flexible format like HDF5 is not appropriate for
> > archival (in the general sense of the term) purposes.
> >
> What I meant was that FITS was defined in terms of the actual binary  
> representation (originally on tape, but now generalized to other  
> storage). The idea being that once written as a FITS file, it would  
> always be supported in the future as a format. With HDF, the focus  
> (as I understood it) was on the software interface, and that the  
> binary representation used may change. And for HDF that binary  
> representation has changed over time (again, as I understand it,  
> perhaps I've been misinformed). That kind of variability is a real  
> killer for archival purposes. Is HDF5 considered stable enough that  
> no future changes are envisioned? (And have they guaranteed to  
> support HDF5 indefinitely even if new enhancements are proposed?).  
> That's what it would take to be accepted as an archival format.

Ok. Thanks for the explanation.  Well, my impression is that the THG
people is trying hard to stick with a stable version of the format.  In
fact, the latest incarnation of HDF5 (1.8.0, in beta stage now) claims
that it is able to read files from *all* the previous versions of HDF5.
>From the "What's New" announcement of 1.8.0 [1]:

"""
Backward and Forward Format Compatibility:

The HDF5 Release 1.8.0 library will read all existing HDF5 files, from
this or any prior release.  Although this release contains features that
require additions and/or changes to the HDF5 file format, by default
this release will write out files that conform to a "maximum
compatibility" principle.  That is, files are written with the earliest
version of the file format that describes the information, rather than
always using the latest version possible.  This provides the best
forward compatibility by allowing the maximum number of older versions
of the library to read files produced with this release.

If library features are used that require new file format features, or
if the application requests that the library write out only the latest
version of the file format, the files produced with this version of the
library may not be readable by older versions of the HDF5 library.
"""

So, not only backward compatibility is important for them, but also the
forward one (which could also be important for archival purposes).

Furthermore, they have a pretty complete FAQ [2] on the issues about
bugs in previous releases that might prevent this backward/forward
compatibility and suggestions and workarounds (when they are
known/possible) for coping with them.

This is not to say that HDF5 is completely free of issues for archival
purposes, but at least, their developers seem to try hard to provide
support for avoiding (or workarounding in case of problems) them.

Cheers,

[1] http://www.hdfgroup.uiuc.edu/HDF5/doc_1.8pre/WhatsNew180.html
[2] http://www.hdfgroup.org/HDF5/faq/bkfwd-compat.html

-- 
Francesc Altet    |  Be careful about using the following code --
Carabos Coop. V.  |  I've only proven that it works, 
www.carabos.com   |  I haven't tested it. -- Donald Knuth




More information about the SciPy-User mailing list