Newbie - converting csv files to arrays in NumPy - Matlab vs. Numpy comparison

Travis E. Oliphant oliphant.travis at ieee.org
Fri Jan 12 01:07:32 EST 2007


oyekomova wrote:
> Thanks for your help. I compared the following code in NumPy with the
> csvread in Matlab for a very large csv file. Matlab read the file in
> 577 seconds. On the other hand, this code below kept running for over 2
> hours. Can this program be made more efficient? FYI - The csv file was
> a simple 6 column file with a header row and more than a million
> records.
> 
> 
> import csv
> from numpy import array
> import time
> t1=time.clock()
> file_to_read = file('somename.csv','r')
> read_from = csv.reader(file_to_read)
> read_from.next()
> 
> datalist = [ map(float, row[:]) for row in read_from ]
> 
> # now the real data
> data = array(datalist, dtype = float)
> 
> elapsed=time.clock()-t1
> print elapsed
> 


If you use numpy.fromfile, you need to skip past the initial header row 
yourself.  Something like this:

fid = open('somename.csv')
data = numpy.fromfile(fid, sep=',').reshape(-1,6)
# for 6-column data.

-Travis




More information about the Python-list mailing list