[SciPy-dev] Binary i/o package

Erin Sheldon erin.sheldon at gmail.com
Wed May 30 15:14:34 EDT 2007


Hi all -

The tofile() and fromfile() methods are excellent for reading and
writing arrays to disk, and there are good tools in scipy for ascii
input/output.

I often find myself writing huge binary files to disk and then wanting
to extract particular rows and columns from that file.  It is natural
to associate the fields of a numpy array with the fields in the file,
which may be inhomogeneous.  The ability to extract this information
is straightforward to code in C/C++ and brings the file close to a
database in functionality without all the overhead of working with a
full database or pytables.

I have written a simple C++ numpy extension for reading binary data
into a numpy array, with the ability to select rows and fields
(columns).  One enters the dtype that describes each row in
list-of-tuples form and the code creates a numpy array (with perhaps a
subset of the fields), reads in the requested data, and returns the
result.  Pretty simple.

I feel like this is a pretty generic and useful type of operation, and
if people agree I think it could go into the scipy io subpackage.

The package is called readfields currently; it contains the
readfields.so from the C++ code as well as simple_format.py which
contains modules create files with a simple self-describing header and
data written using tofile()) and a read function which parses the
header and uses readfields to extract subsets of data.

Anyone interested in trying it out can get the package here:

http://sdss.physics.nyu.edu/esheldon/python/code/

Erin



More information about the SciPy-Dev mailing list