[Numpy-discussion] use index array of len n to select columns of n x m array

Fri Aug 6 16:11:55 EDT 2010

On 2010-08-06 06:57, Keith Goodman wrote:
 > You can speed it up by getting rid of two copies:
 >
 > idx = np.arange(a.shape[0])
 > idx *= a.shape[1]
 > idx += i

Keith, you're right of course. I'd forgotten about your earlier suggestion about 
operating in-place. Here's my new version:

def rowtake(a, i):
     """For each row in a, return values according to column indices in the
     corresponding row in i. Returned shape == i.shape"""
     assert a.ndim == 2
     assert i.ndim <= 2
     if i.ndim == 1:
         j = np.arange(a.shape[0])
     else: # i.ndim == 2
         j = np.repeat(np.arange(a.shape[0]), i.shape[1])
         j.shape = i.shape
     j *= a.shape[1]
     j += i
     return a.flat[j]

 >>> a = np.arange(20)
 >>> a.shape = 5, 4
 >>> a
array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]])
 >>> i = np.array([[2, 1],
                   [3, 1],
                   [1, 1],
                   [0, 0],
                   [3, 1]])
 >>> timeit rowtake(a, i)
100000 loops, best of 3: 14.7 us per loop
 >>> timeit rowtake_cy(a, i)
100000 loops, best of 3: 10.6 us per loop

So now it's almost as fast as the element-by-element Cython version.

On 2010-08-06 03:29, josef.pktd at gmail.com wrote:
 > I still find broadcasting easier to read, even if it might be a bit slower
 >
 >>>> a[np.arange(5)[:,None], i]
 > array([[ 2,  1],
 >        [ 7,  5],
 >        [ 9,  9],
 >        [12, 12],
 >        [19, 17]])

Josef, I'd forgotten you could use None to increase the dimensionality of an 
array. Neat. And, somehow, it's almost twice as fast as the Cython version!:

 >>> timeit a[np.arange(a.shape[0])[:, None], i]
100000 loops, best of 3: 5.76 us per loop

I like it. Thanks for all the help!

Martin