Newbie - converting csv files to arrays in NumPy - Matlab vs. Numpy comparison

Wed Jan 10 14:48:06 EST 2007

Thanks for your help. I compared the following code in NumPy with the
csvread in Matlab for a very large csv file. Matlab read the file in
577 seconds. On the other hand, this code below kept running for over 2
hours. Can this program be made more efficient? FYI - The csv file was
a simple 6 column file with a header row and more than a million
records.

import csv
from numpy import array
import time
t1=time.clock()
file_to_read = file('somename.csv','r')
read_from = csv.reader(file_to_read)
read_from.next()

datalist = [ map(float, row[:]) for row in read_from ]

# now the real data
data = array(datalist, dtype = float)

elapsed=time.clock()-t1
print elapsed

Robert Kern wrote:
> oyekomova wrote:
> > I would like to know how to convert a csv file with a header row into a
> > floating point array without the header row.
>
> Use the standard library module csv. Something like the following is a cheap and
> cheerful solution:
>
>
> import csv
> import numpy
>
> def float_array_from_csv(filename, skip_header=True):
>     f = open(filename)
>     try:
>         reader = csv.reader(f)
>         floats = []
>         if skip_header:
>             reader.next()
>         for row in reader:
>             floats.append(map(float, row))
>     finally:
>         f.close()
>
>     return numpy.array(floats)
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless enigma
>  that is made terrible by our own mad attempt to interpret it as though it had
>  an underlying truth."
>   -- Umberto Eco