[SciPy-user] Fast saving/loading of huge matrices
Gael Varoquaux
gael.varoquaux at normalesup.org
Fri Apr 20 02:24:20 EDT 2007
I agree that pytable lack a really simple interface. Say something that
dumps a dic to an hdf5 file, and vice-versa (althought hdf5 -> dic is a
bit harder as all the hdf5 types may not convert nicely to python types).
On my experiment I use this code to load the data:
"""
def load_h5(file_name):
""" Loads an hdf5 file and returns a dict with the hdf5 data in it.
"""
file = tables.openFile(file_name)
out_dict = {}
for key, value in file.leaves.iteritems():
if isinstance(value, tables.UnImplemented):
continue
try:
value = value.read()
try:
if isinstance(value, CharArray):
value = value.tolist()
except Exception, inst:
print "Couldn't convert %s to a list" % key
print inst
if len(value) == 1:
value = value[0]
out_dict[key[1:]] = value
except Exception, inst:
print "couldn't load %s" % key
print inst
file.close()
return(out_dict)
"""
It works well on our files, but our files are produced by code I wrote,
so they do not explore all the possibilities of hdf5.
Similarily I have some python code to dump a dic of arrays to an hdf5
file:
"""
def dic_to_h5(filename, dic):
""" Saves all the arrays in a dictionary to an hdf5 file.
"""
out_file = tables.openFile(filename, mode = "w")
for key, value in dic.iteritems():
if isinstance( value, ndarray):
out_file.createArray('/', str(key), value)
out_file.close()
"""
This code is not general enough to go in pytables, but if the list wants
to improve it a bit, then we could propose it for inclusion, or at least
put it on the cookbook.
Cheers,
Gaël
On Thu, Apr 19, 2007 at 06:01:44PM -0500, Ryan Krauss wrote:
> I have a very similar question. Pytables clearly has much more
> capability than I need and the documentation is a bit intimidating. I
> have tests that involve multiple channels of data that I need to
> store. Can you give a simple example of using pytables to store 3
> seperate Nx1 vectors in the same file and easily retreive the
> individual channels. The cPickle equivalent would be something like:
> v1=rand(1000,)
> v2=rand(1000,)
> mydict={'v1':v1,'v2':v2}
> and then dump mydict to a pickle file. How would I do this samething
> in pytables?
> Thanks,
> Ryan
> On 4/19/07, Vincent Nijs <v-nijs at kellogg.northwestern.edu> wrote:
> > Pytables looks very interesting and clearly has a ton of features. However,
> > if I am trying to just read-in a csv file can it figure out the correct data
> > types on its own (e.g., dates, floats, strings)? Read "I am too lazy to
> > types in variables names and types myself if the names are already in the
> > file" :)
> > Similarly can you just dump a dictionary or rec-array into a pytable with
> > one 'save' command and have pytables figure out the variable names and
> > types? This seems relevant since you wouldn't have to do that with cPickle
> > which saves user-time if not computer time.
> > Sorry if this is too off-topic.
> > Vincent
More information about the SciPy-User
mailing list