get uniform binary data into array for further processing

Chris Barker chrishbarker at home.net
Mon Oct 15 15:24:25 EDT 2001


chr_w at gmx.de wrote:
> I have to import pretty massive binary files containing double-precision
> real values (climate data) so I can do some operations to it (mainly
> simplyfying, getting the daily some etc.). The file can get really ugly in size... up to
> a gig eventually...

I do this a lot, but my files are usaully under 1-MB, as tohers have
pointed out, you may have to deal with these in chunks.

> But my main concern:
> a) what's the best and fasted way to get this data into a 2-dimensional
> array (or list?)?
> b) which modul to use (Numeric, array, struct, pickle) ???

If you want 2-d arrays, the only option is Numeric, which is really a
great tool for this kind of thing anyway. Unfortunately, Numeric arrays
do not have a "fromfile()" mthod, so tyou have to read the data into a
string first, and then put it into the array:

from Numeric import *
A = fromstring(file.read(numbytes),Float)
A.shape = (m,n)

You now have a 2-d array of numbytes of data of the python float type (C
double). The downside of this is that you have two copies of the data in
memory in the middle of this process. The only solutions to that are:

1) Use mmap's files (this is a bit tricky, but it can be done). see the
mmap module.

2) write a fromfile() function for NumPy, in C. I'd love3 to see that!
I've bee meaning to do it myself, but have not gotten around to it yet.

note that Paul Rubin's suggestion of using the array module has the same
problem: the data is read directly into an arrray.array, but then would
have to be copied to be put into a NumPy array to get a 2-d array.


If you need to read binary data that is of mixed type (like records, ech
with a couple of floats and an int , for instance) you can read it in as
bytes, and then slice it an convert it to the right type. I have some C
code that makes this fast and memory efficient. Send me an email if you
want it.

c) I have some fortran code which handles this quite good - is there a
> similar way to do it in python?

Probably not similar to Fortran, if you loop through the data, reading a
few bytes at a time, it will be VERY slow. Python + NumPy is an
excellent way to do what you want to do, however.

> I managed to import a test file of some megs but it took ages and i'm not
> sure i did it right, since most values appeared to be 0.0 's... which might be
> ok´...?!?

You're going to need to test against known data to be sure, and if it is
very slow, you are probably loopingk, which will be a killer.

If you want to send me a little sample data and the specs, I could whip
out a little example code for you. 

-Chris


-- 
Christopher Barker,
Ph.D.                                                           
ChrisHBarker at home.net                 ---           ---           ---
http://members.home.net/barkerlohmann ---@@       -----@@       -----@@
                                   ------@@@     ------@@@     ------@@@
Oil Spill Modeling                ------   @    ------   @   ------   @
Water Resources Engineering       -------      ---------     --------    
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------




More information about the Python-list mailing list