[Numpy-discussion] Copy vs View for array[array] (was Histograms via indirect index arrays)
Tim Hochberg
tim.hochberg at cox.net
Fri Mar 17 14:21:01 EST 2006
I just figured I'd add a couple of thought outside that needlessly
contentious thread.
In theory I'm all for view semantics for an array indexed by an array
(I'm sure we have a good name for that, but it's escaping me). Indexing
in numpy can be confusing enough without some indexing operations
returning views and others copies. This is orthogonal to any issues of
performance.
In practice, I'm a bit skeptical. The result would need to be some sort
of psuedo array object (similar to array.flat). Operations on this
object would necessarily have worse performance than operations on a
normal array due to the added level of indirection. In some
circumstances it would also hold onto a lot of memory that might
otherwise be freed since it hold a reference to the data for both the
original array and the index array. How much of an effect this would
have on the typical user is hard to say; any effects would certainly
depend a lot on usage patterns.
Here's some cases and guesses as to how I think things would work out
(idx is an array):
a[idx] += 1 # Improved performance and more importantly
result consistent with other indexing ops
c = a[idx] + b[idx] # Probably neutral; Creating the psuedo arrays
should be fast, then they need to copied anyway in the add.
c, d = a[idx], b[idx]
del a, b
e = c + d
f = 2*c + d # Probably bad: Need to copy the psuedo array
into a contiguous buffer multiple times.
# Also holds extra memory.
I'd like to see someone try it, but it's not high enough on my priority
list for me to dive into it (I'm blowing all my spare cycles and then
some on numexpr).
-tim
More information about the NumPy-Discussion
mailing list