Skipping bytes while reading a binary file?

Lionel lionel.keene at gmail.com
Thu Feb 5 17:40:49 EST 2009


On Feb 5, 2:22 pm, Lionel <lionel.ke... at gmail.com> wrote:
> Hello,
> I have data stored in binary files. Some of these files are
> huge...upwards of 2 gigs or more. They consist of 32-bit float complex
> numbers where the first 32 bits of the file is the real component, the
> second 32bits is the imaginary, the 3rd 32-bits is the real component
> of the second number, etc.
>
> I'd like to be able to read in just the real components, load them
> into a numpy.ndarray, then load the imaginary coponents and load them
> into a numpy.ndarray.  I need the real and imaginary components stored
> in seperate arrays, they cannot be in a single array of complex
> numbers except for temporarily. I'm trying to avoid temporary storage,
> though, because of the size of the files.
>
> I'm currently reading the file scanline-by-scanline to extract rows of
> complex numbers which I then loop over and load into the real/
> imaginary arrays as follows:
>
>         self._realData         = numpy.empty((Rows, Columns), dtype =
> numpy.float32)
>         self._imaginaryData = numpy.empty((Rows, Columns), dtype =
> numpy.float32)
>
>         floatData = array.array('f')
>
>         for CurrentRow in range(Rows):
>
>             floatData.fromfile(DataFH, (Columns*2))
>
>             position = 0
>             for CurrentColumn in range(Columns):
>
>                  self._realData[CurrentRow, CurrentColumn]          =
> floatData[position]
>                 self._imaginaryData[CurrentRow, CurrentColumn]  =
> floatData[position+1]
>                 position = position + 2
>
> The above code works but is much too slow. If I comment out the body
> of the "for CurrentColumn in range(Columns)" loop, the performance is
> perfectly adequate i.e. function call overhead associated with the
> "fromfile(...)" call is not very bad at all. What seems to be most
> time-consuming are the simple assignment statements in the
> "CurrentColumn" for-loop.
>
> Does anyone see any ways of speeding this up at all? Reading
> everything into a complex64 ndarray in one fell swoop would certainly
> be easier and faster, but at some point I'll need to split this array
> into two parts (real / imaginary). I'd like to have that done
> initially to keep the memory usage down since the files are so
> ginormous.
>
> Psyco is out because I need 64-bits, and I didn't see anything on the
> forums regarding a method that reads in every other 32-bit chunk form
> a file into an array. I'm not sure what else to try.
>
> Thanks in advance.
> L

Hmmm...I've just discovered "weave.inline()". Maybe I'll just do the
assignments in C.

Still soliciting advice, of course. :-)



More information about the Python-list mailing list