[SciPy-user] Reading in data as arrays, quickly and easily?
Travis Oliphant
oliphant at ee.byu.edu
Fri Jul 9 16:38:05 EDT 2004
Eric Jonas wrote:
>Hello! I'm trying to read in large chunks of binary data as arrays, but
>the file formats are complex enough that there is lots of junk that
>needs to be skipped over. I have a functioning datafile object in python
>with a read(N) method that returns the next N data points in the file,
>doing the various raw manipulations, endian conversions, and the like
>internally.
>
>
The scipy.io facility has some tools for this. It will handle
byte-swapping and reads directly into a Numeric array.
Look at
scipy.io.fopen
and then the fid.read method.
>>> info(io.fopen)
fopen(file_name, permission='rb', format='n')
Class for reading and writing binary files into Numeric arrays.
Inputs:
file_name -- The complete path name to the file to open.
permission -- Open the file with given permissions: ('r', 'w', 'a')
for reading, writing, or appending. This is the same
as the mode argument in the builtin open command.
format -- The byte-ordering of the file:
(['native', 'n'], ['ieee-le', 'l'], ['ieee-be', 'b']) for
native, little-endian, or big-endian respectively.
Methods:
read -- read data from file and return Numeric array
write -- write to file from Numeric array
fort_read -- read Fortran-formatted binary data from the file.
fort_write -- write Fortran-formatted binary data to the file.
rewind -- rewind to beginning of file
size -- get size of file
seek -- seek to some position in the file
tell -- return current position in file
close -- close the file
Attributes (Read only):
bs -- non-zero if byte-swapping is performed on read and write.
format -- 'native', 'ieee-le', or 'ieee-be'
fid -- the file object
closed -- non-zero if the file is closed.
mode -- permissions with which this file was opened
name -- name of the file
If you want to use a lower-level tool you can just open a file with
Python and then pass it to
scipy.io.numpyio.fread
>>> info(io.numpyio.fread)
g = numpyio.fread( fid, Num, read_type { mem_type, byteswap})
fid = open file pointer object (i.e. from fid =
open('filename') )
Num = number of elements to read of type read_type
read_type = a character in 'cb1silfdFD' (PyArray types)
describing how to interpret bytes on disk.
OPTIONAL
mem_type = a character (PyArray type) describing what kind of
PyArray to return in g. Default = read_type
byteswap = 0 for no byteswapping or a 1 to byteswap (to handle
different endianness). Default = 0.
Alternatively you can use weave or f2py (yes it can wrap C code too) if
your pre-processing needs are more extensive then byteswapping and you
can't do it in Numeric after the fact.
-Travis O.
More information about the SciPy-User
mailing list