[Numpy-discussion] fast duplicate of array

Anne Archibald peridot.faceted at gmail.com
Sat Jan 23 19:29:42 EST 2010


2010/1/23 Alan G Isaac <aisaac at american.edu>:
> On 1/23/2010 5:01 PM, Anne Archibald wrote:
>> If both arrays are "C contiguous", or more generally contiguous blocks
>> of memory with the same strided structure, you might get faster
>> copying by flattening them first, so that it can go in a single
>> memcpy().
>
> I may misuderstand this.  Did you just mean
> x.flat = y.flat
> ?

No, .flat constructs an iterator that traverses the object as if it
were flat. I had in mind accessing the underlying data through views
that were flat:

In [3]: x = np.random.random((1000,1000))

In [4]: y = np.random.random((1000,1000))

In [5]: xf = x.view()

In [6]: xf.shape = (-1,)

In [7]: yf = y.view()

In [8]: yf.shape = (-1,)

In [9]: yf[:] = xf[:]

This may still use a loop instead of a memcpy(), in which case you'd
want to look for an explicit memcpy()-based implementation, but when
manipulating multidimensional arrays you have (in principle, anyway)
nested loops which may not be executed in the cache-optimal order.
Ideally numpy would automatically notice when operations can be done
on flattened versions of arrays and get rid of some of the looping and
indexing, but I wouldn't count on it. At one point I remember finding
that the loops were reordered not for cache coherence but to make the
inner loop over the biggest dimension (to minimize looping overhead).

Anne


> If so, I find that to be *much* slower.
>
> Thanks,
> Alan
>
>
> x = np.random.random((1000,1000))
> y = x.copy()
> t0 = time.clock()
> for t in range(1000): x = y.copy()
> print(time.clock() - t0)
> t0 = time.clock()
> for t in range(1000): x[:,:] = y
> print(time.clock() - t0)
> t0 = time.clock()
> for t in range(1000): x.flat = y.flat
> print(time.clock() - t0)
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list