Newbie - converting csv files to arrays in NumPy - Matlab vs. Numpy comparison

Sat Jan 13 14:13:57 EST 2007

Thanks to everyone for their excellent suggestions. I was able to
acheive the following results with all your suggestions. However, I am
unable to cross file size of 6 million rows. I would appreciate any
helpful suggestions on avoiding memory errors. None of the solutions
posted was able to cross this limit.

>>> Data size 999999
Elapsed 31.60352213
>>> ================================ RESTART ================================
>>>
Data size 1999999
Elapsed 63.4050884573
>>> ================================ RESTART ================================
>>>
Data size 4999999
Elapsed 177.888915777
>
Data size 5999999'
Traceback (most recent call last):
  File "C:/Documents/some.py", line 27, in <module>
    read_test()
  File "C:/Documents/some.py", line 21, in read_test
    data   = array(data, dtype = float)
MemoryError

Robert Kern wrote:
> Travis E. Oliphant wrote:
>
> > If you use numpy.fromfile, you need to skip past the initial header row
> > yourself.  Something like this:
> >
> > fid = open('somename.csv')
>
> # I think you also meant to include this line:
> header = fid.readline()
>
> > data = numpy.fromfile(fid, sep=',').reshape(-1,6)
> > # for 6-column data.
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless enigma
>  that is made terrible by our own mad attempt to interpret it as though it had
>  an underlying truth."
>   -- Umberto Eco