Newbie - converting csv files to arrays in NumPy - Matlab vs. Numpy comparison
oyekomova
oyekomova at hotmail.com
Wed Jan 10 14:48:06 EST 2007
Thanks for your help. I compared the following code in NumPy with the
csvread in Matlab for a very large csv file. Matlab read the file in
577 seconds. On the other hand, this code below kept running for over 2
hours. Can this program be made more efficient? FYI - The csv file was
a simple 6 column file with a header row and more than a million
records.
import csv
from numpy import array
import time
t1=time.clock()
file_to_read = file('somename.csv','r')
read_from = csv.reader(file_to_read)
read_from.next()
datalist = [ map(float, row[:]) for row in read_from ]
# now the real data
data = array(datalist, dtype = float)
elapsed=time.clock()-t1
print elapsed
Robert Kern wrote:
> oyekomova wrote:
> > I would like to know how to convert a csv file with a header row into a
> > floating point array without the header row.
>
> Use the standard library module csv. Something like the following is a cheap and
> cheerful solution:
>
>
> import csv
> import numpy
>
> def float_array_from_csv(filename, skip_header=True):
> f = open(filename)
> try:
> reader = csv.reader(f)
> floats = []
> if skip_header:
> reader.next()
> for row in reader:
> floats.append(map(float, row))
> finally:
> f.close()
>
> return numpy.array(floats)
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless enigma
> that is made terrible by our own mad attempt to interpret it as though it had
> an underlying truth."
> -- Umberto Eco
More information about the Python-list
mailing list