[Numpy-discussion] data exchange format

Gabriel J.L. Beckers beckers at orn.mpg.de
Tue May 20 13:06:29 EDT 2008


I am not exactly an expert on data storage, but I use PyTables a lot for
all kinds of scientific data sets and am very happy with it. Indeed it
has many advanced capabilities; so it may seem overkill at first glance.
But for simple tasks such as the one you describe the api is simple;
indeed I also use it for small data sets because it is such a quick way
of storing data in a portable way. Regarding speed and overhead: I don't
know in general what the penalties or gains are for very small files. On
my system an empty file is 1032 bytes, and if I fill it with an array of
3 by 30000 random float64's it is 723080. Not so bad.

Just try it out yourself:

>>> import numpy, tables
>>> ta = numpy.random.random((3,30000))
>>> f = tables.openFile('test.h5','w')
>>> f.createArray('/','testarray',ta)
>>> f.close()

With most real data file size can be smaller because you have the
option of enabling compression.

But I must admit that I haven't tried reading HDF5 in Matlab or C (and
never will); I know it is possible, but I don't know how difficult it
is.

Cheers, Gabriel

On Tue, 2008-05-20 at 12:11 -0400, Gary Pajer wrote:
> On Tue, May 20, 2008 at 10:26 AM, Gabriel J.L. Beckers
> <beckers at orn.mpg.de> wrote:
> > PyTables is an efficient way of doing it (http://www.pytables.org). You
> > essentially write data to a HDF5 file, which is portable and can be read
> > in Matlab or in a C program (using the HDF5 library).
> >
> > Gabriel
> 
> I thought about that.  It seems to have much more than I need, so I
> wonder if it's got more overhead / less speed / more complex API than
> I need.   But big isn't necessarily bad, but it might be.  Is pytables
> overkill?
> 
> 
> >
> > On Tue, 2008-05-20 at 09:32 -0400, Gary Pajer wrote:
> >> I want to store data in a way that can be read by a C or Matlab program.
> >>
> >> Not too much data, not too complicated:  a dozen or so floats, a few
> >> integers, a few strings, and a (3, x) numpy array where typically 500
> >> < x < 30000.
> >>
> >> I was about to create my own format for storage when it occurred to me
> >> that I might want to use XML or some other standard format.  Like
> >> JSON, perhaps.   Can anyone comment, esp relating to numpy
> >> implementation issues, or offer suggestions?
> >>
> >> Thanks,
> >> Gary
> >> _______________________________________________
> >> Numpy-discussion mailing list
> >> Numpy-discussion at scipy.org
> >> http://projects.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> > _______________________________________________
> > Numpy-discussion mailing list
> > Numpy-discussion at scipy.org
> > http://projects.scipy.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion





More information about the NumPy-Discussion mailing list