[Numpy-discussion] reading gzip compressed files using numpy.fromfile

Peter Schmidtke pschmidtke at mmb.pcb.ub.es
Wed Oct 28 15:31:43 EDT 2009


Dear Numpy Mailing List Readers,

I have a quite simple problem, for what I did not find a solution for now. 
I have a gzipped file lying around that has some numbers stored in it and I
want to read them into a numpy array as fast as possible but only a bunch
of data at a time. 
So I would like to use numpys fromfile funtion. 

For now I have somehow the following code :



        f=gzip.open( "myfile.gz", "r" )
xyz=npy.fromfile(f,dtype="float32",count=400) 


So I would read 400 entries from the file, keep it open, process my data,
come back and read the next 400 entries. If I do this, numpy is complaining
that the file handle f is not a normal file handle :
OError: first argument must be an open file

but in fact it is a zlib file handle. But gzip gives access to the normal
filehandle through f.fileobj.

So I tried  xyz=npy.fromfile(f.fileobj,dtype="float32",count=400)

But there I get just meaningless values (not the actual data) and when I
specify the sep=" " argument for npy.fromfile I get just .1 and nothing
else. 

Can you tell me why and how to fix this problem? I know that I could read
everything to memory, but these files are rather big, so I simply have to
avoid this.

Thanks in advance.


-- 

Peter Schmidtke

----------------------
PhD Student at the Molecular Modeling and Bioinformatics Group
Dep. Physical Chemistry
Faculty of Pharmacy
University of Barcelona




More information about the NumPy-Discussion mailing list