Newbie - converting csv files to arrays in NumPy - Matlab vs. Numpy comparison
sturlamolden
sturlamolden at yahoo.no
Wed Jan 10 15:43:42 EST 2007
oyekomova wrote:
> Thanks for your help. I compared the following code in NumPy with the
> csvread in Matlab for a very large csv file. Matlab read the file in
> 577 seconds. On the other hand, this code below kept running for over 2
> hours. Can this program be made more efficient? FYI - The csv file was
> a simple 6 column file with a header row and more than a million
> records.
>
>
> import csv
> from numpy import array
> import time
> t1=time.clock()
> file_to_read = file('somename.csv','r')
> read_from = csv.reader(file_to_read)
> read_from.next()
> datalist = [ map(float, row[:]) for row in read_from ]
I'm willing to bet that this is your problem. Python lists are arrays
under the hood!
Try something like this instead:
# read the whole file in one chunk
lines = file_to_read.readlines()
# count the number of columns
n = 1
for c in lines[1]:
if c == ',': n += 1
# count the number of rows
m = len(lines[1:])
#allocate
data = empty((m,n),dtype=float)
# create csv reader, skip header
reader = csv.reader(lines[1:])
# read
for i in arange(0,m):
data[i,:] = map(float,reader.next())
And if this is too slow, you may consider vectorizing the last loop:
data = empty((m,n),dtype=float)
newstr = ",".join(lines[1:])
flatdata = data.reshape((n*m)) # flatdata is a view of data, not a copy
reader = csv.reader([newstr])
flatdata[:] = map(float,reader.next())
I hope this helps!
> Robert Kern wrote:
> > oyekomova wrote:
> > > I would like to know how to convert a csv file with a header row into a
> > > floating point array without the header row.
> >
> > Use the standard library module csv. Something like the following is a cheap and
> > cheerful solution:
> >
> >
> > import csv
> > import numpy
> >
> > def float_array_from_csv(filename, skip_header=True):
> > f = open(filename)
> > try:
> > reader = csv.reader(f)
> > floats = []
> > if skip_header:
> > reader.next()
> > for row in reader:
> > floats.append(map(float, row))
> > finally:
> > f.close()
> >
> > return numpy.array(floats)
> >
> > --
> > Robert Kern
> >
> > "I have come to believe that the whole world is an enigma, a harmless enigma
> > that is made terrible by our own mad attempt to interpret it as though it had
> > an underlying truth."
> > -- Umberto Eco
More information about the Python-list
mailing list