[Numpy-discussion] Re: 'append' array method request.

Robert Kern robert.kern at gmail.com
Fri Apr 21 15:13:07 EDT 2006


Robert Hetland wrote:
> 
> I find myself writing things like
> 
> x = []; y = []; t = []
> for line in open(filename).readlines():
>     xstr, ystr, tstr = line.split()
>     x.append(float(xstr))
>     y.append(float(ystr)_
>     t.append(dateutil.parser.parse(tstr))  # or something similar
> x = asarray(x)
> y = asarray(y)
> t = asarray(t)
> 
> I think it would be nice to be able to create empty arrays, and  append
> the values onto the end as I loop through the file without  creating the
> intermediate list.  Is this reasonable? 

Not in the core array object, no. We can't make the underlying pointer point to
something else (because you've just reallocated the whole memory block to add an
item to the array) without invalidating all of the views on that array. This is
also the reason that numpy arrays can't use the standard library's array module
as its storage. That said:

> Is there a way  to do this with
> existing methods or functions that I am missing?  Is  there a better way
> altogether?

We've done performance tests before. The fastest way that I've found is to use
the stdlib array module to accumulate values (it uses the same preallocation
strategy that Python lists use, and you can't create views from them, so you are
always safe) and then create the numpy array using fromstring on that object
(stdlib arrays obey the buffer protocol, so they will be treated like strings of
binary data). I posted timings one or two or three years ago on one of the scipy
lists.

However, lists are fine if you don't need blazing speed/low memory usage.

-- 
Robert Kern
robert.kern at gmail.com

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco





More information about the NumPy-Discussion mailing list