[Numpy-discussion] Copy vs View for array[array] (was Histograms via indirect index arrays)

Fri Mar 17 14:21:01 EST 2006

I just figured I'd add a couple of thought outside that needlessly 
contentious thread.

In theory I'm all for view semantics for an array indexed by an array 
(I'm sure we have a good name for that, but it's escaping me). Indexing 
in numpy can be confusing enough without some indexing operations 
returning views and others copies. This is orthogonal to any issues of 
performance.

In practice, I'm a bit skeptical. The result would need to be some sort 
of psuedo array object (similar to array.flat). Operations on this 
object would necessarily have worse performance than operations on a 
normal array due to the added level of indirection. In some 
circumstances it would also hold onto a lot of memory that might 
otherwise be freed since it hold a reference to the data for both the 
original array and the index array. How much of an effect this would 
have on the typical user is hard to say; any effects would certainly 
depend a lot on usage patterns.

Here's some cases and guesses as to how I think things would work out 
(idx is an array):

a[idx] += 1             # Improved performance and more importantly 
result consistent with other indexing ops

c = a[idx] + b[idx]  # Probably neutral; Creating the psuedo arrays 
should be fast, then they need to copied anyway in the add.

c, d = a[idx], b[idx]
del a, b
e = c + d
f = 2*c + d              # Probably bad: Need to copy the psuedo array 
into a contiguous buffer multiple times.
                               # Also holds extra memory.

I'd like to see someone try it, but it's not high enough on my priority 
list for me to dive into it (I'm blowing all my spare cycles and then 
some on numexpr).

-tim