Newbie - converting csv files to arrays in NumPy - Matlab vs. Numpy comparison

sturlamolden sturlamolden at yahoo.no
Wed Jan 10 15:43:42 EST 2007


oyekomova wrote:
> Thanks for your help. I compared the following code in NumPy with the
> csvread in Matlab for a very large csv file. Matlab read the file in
> 577 seconds. On the other hand, this code below kept running for over 2
> hours. Can this program be made more efficient? FYI - The csv file was
> a simple 6 column file with a header row and more than a million
> records.
>
>
> import csv
> from numpy import array
> import time
> t1=time.clock()
> file_to_read = file('somename.csv','r')
> read_from = csv.reader(file_to_read)
> read_from.next()

> datalist = [ map(float, row[:]) for row in read_from ]

I'm willing to bet that this is your problem. Python lists are arrays
under the hood!

Try something like this instead:


# read the whole file in one chunk
lines = file_to_read.readlines()
# count the number of columns
n = 1
for c in lines[1]:
   if c == ',': n += 1
# count the number of rows
m = len(lines[1:])
#allocate
data = empty((m,n),dtype=float)
# create csv reader, skip header
reader = csv.reader(lines[1:])
# read
for i in arange(0,m):
   data[i,:] = map(float,reader.next())

And if this is too slow, you may consider vectorizing the last loop:

data = empty((m,n),dtype=float)
newstr = ",".join(lines[1:])
flatdata = data.reshape((n*m)) # flatdata is a view of data, not a copy
reader = csv.reader([newstr])
flatdata[:] = map(float,reader.next())

I hope this helps!








> Robert Kern wrote:
> > oyekomova wrote:
> > > I would like to know how to convert a csv file with a header row into a
> > > floating point array without the header row.
> >
> > Use the standard library module csv. Something like the following is a cheap and
> > cheerful solution:
> >
> >
> > import csv
> > import numpy
> >
> > def float_array_from_csv(filename, skip_header=True):
> >     f = open(filename)
> >     try:
> >         reader = csv.reader(f)
> >         floats = []
> >         if skip_header:
> >             reader.next()
> >         for row in reader:
> >             floats.append(map(float, row))
> >     finally:
> >         f.close()
> >
> >     return numpy.array(floats)
> >
> > --
> > Robert Kern
> >
> > "I have come to believe that the whole world is an enigma, a harmless enigma
> >  that is made terrible by our own mad attempt to interpret it as though it had
> >  an underlying truth."
> >   -- Umberto Eco




More information about the Python-list mailing list