[Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal
Eric Firing
efiring at hawaii.edu
Thu Apr 2 14:03:27 EDT 2015
On 2015/04/02 4:15 AM, Jaime Fernández del Río wrote:
> We probably need more traction on the "should this be done?" discussion
> than on the "can this be done?" one, the need for a reordering of the
> axes swings me slightly in favor, but I mostly don't see it yet.
As a long-time user of numpy, and an advocate and teacher of Python for
science, here is my perspective:
Fancy indexing is a horrible design mistake--a case of cleverness run
amok. As you can read in the Numpy documentation, it is hard to
explain, hard to understand, hard to remember. Its use easily leads to
unreadable code and hard-to-see errors. Here is the essence of an
example that a student presented me with just this week, in the context
of reordering eigenvectors based on argsort applied to eigenvalues:
In [25]: xx = np.arange(2*3*4).reshape((2, 3, 4))
In [26]: ii = np.arange(4)
In [27]: print(xx[0])
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
In [28]: print(xx[0, :, ii])
[[ 0 4 8]
[ 1 5 9]
[ 2 6 10]
[ 3 7 11]]
Quickly now, how many numpy users would look at that last expression and
say, "Of course, that is equivalent to transposing xx[0]"? And, "Of
course that expression should give a completely different result from
xx[0][:, ii]."?
I would guess it would be less than 1%. That should tell you right away
that we have a real problem here. Fancy indexing can't be *read* by a
sub-genius--it has to be laboriously figured out piece by piece, with
frequent reference to the baffling descriptions in the Numpy docs.
So I think you should turn the question around and ask, "What is the
actual real-world use case for fancy indexing?" How often does real
code rely on it? I have taken advantage of it occasionally, maybe you
have too, but I think a survey of existing code would show that the need
for it is *far* less common than the need for simple orthogonal
indexing. That tells me that it is fancy indexing, not orthogonal
indexing, that should be available through a function and/or special
indexing attribute. The question is then how to make that transition.
Eric
More information about the NumPy-Discussion
mailing list