[SciPy-user] Reading in data as arrays, quickly and easily?

Fri Jul 9 16:38:05 EDT 2004

Eric Jonas wrote:

>Hello! I'm trying to read in large chunks of binary data as arrays, but
>the file formats are complex enough that there is lots of junk that
>needs to be skipped over. I have a functioning datafile object in python
>with a read(N) method that returns the next N data points in the file,
>doing the various raw manipulations, endian conversions, and the like
>internally. 
>  
>
The scipy.io facility has some tools for this.  It will handle 
byte-swapping and reads directly into a Numeric array.

Look at

scipy.io.fopen

and then the fid.read method.

 >>> info(io.fopen)
 fopen(file_name, permission='rb', format='n')

Class for reading and writing binary files into Numeric arrays.

Inputs:

  file_name -- The complete path name to the file to open.
  permission -- Open the file with given permissions: ('r', 'w', 'a')
                for reading, writing, or appending.  This is the same
                as the mode argument in the builtin open command.
  format -- The byte-ordering of the file:
            (['native', 'n'], ['ieee-le', 'l'], ['ieee-be', 'b']) for
            native, little-endian, or big-endian respectively.

Methods:

  read -- read data from file and return Numeric array
  write -- write to file from Numeric array
  fort_read -- read Fortran-formatted binary data from the file.
  fort_write -- write Fortran-formatted binary data to the file.
  rewind -- rewind to beginning of file
  size -- get size of file
  seek -- seek to some position in the file
  tell -- return current position in file
  close -- close the file

Attributes (Read only):

  bs -- non-zero if byte-swapping is performed on read and write.
  format -- 'native', 'ieee-le', or 'ieee-be'
  fid -- the file object
  closed -- non-zero if the file is closed.
  mode -- permissions with which this file was opened
  name -- name of the file

If you want to use a lower-level tool you can just open a file with 
Python and then pass it to

scipy.io.numpyio.fread

 >>> info(io.numpyio.fread)
g = numpyio.fread( fid, Num, read_type { mem_type, byteswap})

     fid =       open file pointer object (i.e. from fid = 
open('filename') )
     Num =       number of elements to read of type read_type
     read_type = a character in 'cb1silfdFD' (PyArray types)
                 describing how to interpret bytes on disk.
OPTIONAL
     mem_type =  a character (PyArray type) describing what kind of
                 PyArray to return in g.   Default = read_type
     byteswap =  0 for no byteswapping or a 1 to byteswap (to handle
                 different endianness).    Default = 0.

Alternatively you can use weave or f2py (yes it can wrap C code too) if 
your pre-processing needs are more extensive then byteswapping and you 
can't do it in Numeric after the fact.

-Travis O.