[Numpy-discussion] Multidimensional Indexing

Wed Apr 8 14:36:51 EDT 2015

On Mon, Apr 6, 2015 at 4:49 PM, Nicholas Devenish <misnomer at gmail.com> wrote:
> With the indexing example from the documentation:
>
> y = np.arange(35).reshape(5,7)
>
> Why does selecting an item from explicitly every row work as I’d expect:
>>>> y[np.array([0,1,2,3,4]),np.array([0,0,0,0,0])]
> array([ 0,  7, 14, 21, 28])
>
> But doing so from a full slice (which, I would naively expect to mean “Every Row”) has some…other… behaviour:
>
>>>> y[:,np.array([0,0,0,0,0])]
> array([[ 0,  0,  0,  0,  0],
>        [ 7,  7,  7,  7,  7],
>        [14, 14, 14, 14, 14],
>        [21, 21, 21, 21, 21],
>        [28, 28, 28, 28, 28]])
>
> What is going on in this example, and how do I get what I expect? By explicitly passing in an extra array with value===index? What is the rationale for this difference in behaviour?
>

To understand this example, it is important to understand that for
multi-dimensional arrays, Numpy attempts to make the index array along
each dimension the same size, using broadcasting. So in your original
example, y[np.array([0,1,2,3,4]),np.array([0,0,0,0,0])], the arrays
are the same size, and the behavior is as you'd expect.

In the second case, the first index is a slice, and the second index
is an array. Documentation for this case can be found in the indexing
docs under "Combining index arrays with slices". Here's the relevant
portion:

> In effect, the slice is converted to an [new] index array ... that is broadcast with the [other] index array

So in your case, the slice ":" is *first* being converted to
np.arange(5), *then* is broadcast across the shape of the [other]
index array so that it is ultimately transformed into something like
np.repeat(np.arange(5)[:,np.newaxis], 5, axis=1), giving you:

array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])

Now at this point you have converted your slice to an [new] index
array of shape (5,5), and your [other] index array is shaped (5,).

So now numpy applies broadcasting rules to the second array to get it
into shape 5. This operation is identical to what just occurred, so
your [other] index array *also* looks like:

array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])

Which then gives the result you saw.

Now, you may say: once the slice was converted to np.arange(5), why
was it then broadcast to shape (5,5) rather than kept at shape (5,)
which would work. The reason (I suspect at least) is to keep it
consistent with other types of slices. Consider if you did something
like:

y[1:3, np.array([0,0,0,0,0])]

Then the same operation would apply as above, except that when the
slice was converted to an array, it would be converted to
np.arange(1,3) which has shape (2,). Obviously this isn't compatible
with the second index array of shape (5,), so it *has* to be
broadcast.

One final note: in this case, you can instead use either of the following:

y[np.array([0,1,2,3,4]), 0]

or

y[:, 0]

using the same steps above, the slice is converted to an np.arange(5),
and then the shapes are compared, (5,) versus (). Then the integer
index is broadcast to shape (5,) which gives you what you want.

Hope that helps.