[Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt

Sun Oct 26 14:40:52 EDT 2014

At 06:32 AM 10/26/2014, you wrote:
>On Sun, Oct 26, 2014 at 1:21 PM, Eelco Hoogendoorn
><hoogendoorn.eelco at gmail.com> wrote:
> > Im not sure why the memory doubling is necessary. Isnt it possible to
> > preallocate the arrays and write to them?
>
>Not without reading the whole file first to know how many rows to preallocate

Seems to me that loadtxt()
http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
should have an optional shape. I often know how many rows I have (# 
of samples of data) from other meta data.
Then:
- if the file is smaller for some reason (you're not sure and pad 
your estimate) it could do one of
     - zero pad array
     - raise()
     - return truncated view
- if larger
     - raise()
     - return data read (this would act like fileObject.read( size ) )
- Ray S 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20141026/19921621/attachment.html>