[Numpy-discussion] Broadcasting and indexing

Emmanuelle Gouillart emmanuelle.gouillart at normalesup.org
Thu Jan 21 13:03:48 EST 2010


Hi Thomas,

broadcasting rules are only for ufuncs (and by extension, some numpy
functions using ufuncs). Indexing obeys different rules and always starts
by the first dimension.

However, you don't have to use broadcasting for such indexing operations:
>>> a[:, c] = 0
zeroes columns indexed by c.

If you want to index along the 3rd dimension, you can use a[:, :, c],
etc. If the dimension along which you index is a variable, you can also
use the function np.rollaxis that allows to change the order of the
dimensions of an array. You may then index along the first dimension
(a[c]), then change back the order of the dimensions. Here is an example:
>>> a = np.ones((3,4,5,6))
>>> c = np.array([1,0,1,0,1], dtype=bool)
>>> tmp_a = np.rollaxis(a, 2, 0)
>>> tmp_a.shape
(5, 3, 4, 6)
>>> tmp_a[c] = 0
>>> a = np.rollaxis(tmp_a, 0, 3)
>>> a.shape
(3, 4, 5, 6)

Hope this helps.

Cheers,

Emmanuelle

On Thu, Jan 21, 2010 at 11:37:09AM -0500, Thomas Robitaille wrote:
> Hello,

> I'm trying to understand how array broadcasting can be used for indexing. In the following, I use the term 'row' to refer to the first dimension of a 2D array, and 'column' to the second, just because that's how numpy prints them out.

> If I consider the following example:

> >>> a = np.random.random((4,5))
> >>> b = np.random.random((5,))
> >>> a + b
> array([[ 1.45499556,  0.60633959,  0.48236157,  1.55357393,  1.4339261 ],
>        [ 1.28614593,  1.11265001,  0.63308615,  1.28904227,  1.34070499],
>        [ 1.26988279,  0.84683018,  0.98959466,  0.76388223,  0.79273084],
>        [ 1.27859505,  0.9721984 ,  1.02725009,  1.38852061,  1.56065028]])

> I understand how this works, because it works as expected as described in

> http://docs.scipy.org/doc/numpy/reference/ufuncs.html#broadcasting

> So b gets broadcast to shape (1,5), then because the first dimension is 1, the operation is applied to all rows.

> Now I am trying to apply this to array indexing. So for example, I want to set specific columns, indicated by a boolean array, to zero, but the following fails:

> >>> c = np.array([1,0,1,0,1], dtype=bool)
> >>> a[c] = 0
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> IndexError: index (4) out of range (0<=index<3) in dimension 0

> However, if I try reducing the size of c to 4, then it works, and sets rows, not columns, equal to zero

> >>> c = np.array([1,0,1,0], dtype=bool)
> >>> a[c] = 0
> >>> a
> array([[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
>        [ 0.41526315,  0.7425491 ,  0.39872546,  0.56141914,  0.69795153],
>        [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
>        [ 0.40771227,  0.60209749,  0.7928894 ,  0.66089748,  0.91789682]])

> But I would have thought that the indexing array would have been broadcast in the same way as for a sum, i.e. c would be broadcast to have dimensions (1,5) and then would have been able to set certain columns in all rows to zero. 

> Why is it that for indexing, the broadcasting seems to happen in a different way than when performing operations like additions or multiplications? For background info, I'm trying to write a routine which performs a set of operations on an n-d array, where n is not known in advance, with a 1D array, so I can use broadcasting rules for most operations without knowing the dimensionality of the n-d array, but now that I need to perform indexing, and the convention seems to change, this is a real issue.

> Thanks in advance for any advice,

> Thomas
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list