[Numpy-discussion] How to read data from text files fast?
Fernando Perez
Fernando.Perez at colorado.edu
Thu Jul 1 13:28:01 EDT 2004
Chris Barker wrote:
> Hi all,
>
> I'm looking for a way to read data from ascii text files quickly. I've
> found that using the standard python idioms like:
>
> data = array((M,N),Float)
> for in range(N):
> data.append(map(float,file.readline().split()))
>
> Can be pretty slow. What I'd like is something like Matlab's fscanf:
>
> data = fscanf(file, "%g", [M,N] )
>
> I may have the syntax a little wrong, but the gist is there. What Matlab
> does keep recycling the format string until the desired number of
> elements have been read.
>
> It is quite flexible, and ends up being pretty fast.
>
> Has anyone written something like this for Numeric (or numarray, but I'd
> prefer Numeric at this point) ?
>
> I was surprised not to find something like this in SciPy, maybe I didn't
> look hard enough.
scipy.io.read_array?
I haven't timed it, because it's been 'fast enough' for my needs.
For reading binary data files, I have this little utility which is basically a
wrapper around Numeric.fromstring (N below is Numeric imported 'as N'). Note
that it can read binary .gz files directly, a _huge_ gain for very sparse
files representing 3d arrays (I can read a 400k gz file which blows up to
~60MB when unzipped in no time at all, while reading the unzipped file is very
slow):
def read_bin(fname,dims,typecode,recast_type=None,offset=0,verbose=0):
"""Read in a binary data file.
Does NOT check for endianness issues.
Inputs:
fname - can be .gz
dims (nx1,nx2,...,nxd)
typecode
recast_type
offset=0: # of bytes to skip in file *from the beginning* before data starts
"""
# config parameters
item_size = N.zeros(1,typecode).itemsize() # size in bytes
data_size = N.product(N.array(dims))*item_size
# read in data
if fname.endswith('.gz'):
data_file = gzip.open(fname)
else:
data_file = file(fname)
data_file.seek(offset)
data = N.fromstring(data_file.read(data_size),typecode)
data_file.close()
data.shape = dims
if verbose:
#print 'Read',data_size/item_size,'data points. Shape:',dims
print 'Read',N.size(data),'data points. Shape:',dims
if recast_type is not None:
data = data.astype(recast_type)
return data
HTH,
f
More information about the NumPy-Discussion
mailing list