[Numpy-discussion] use index array of len n to select columns of n x m array
Martin Spacek
numpy at mspacek.mm.st
Fri Aug 6 16:11:55 EDT 2010
On 2010-08-06 06:57, Keith Goodman wrote:
> You can speed it up by getting rid of two copies:
>
> idx = np.arange(a.shape[0])
> idx *= a.shape[1]
> idx += i
Keith, you're right of course. I'd forgotten about your earlier suggestion about
operating in-place. Here's my new version:
def rowtake(a, i):
"""For each row in a, return values according to column indices in the
corresponding row in i. Returned shape == i.shape"""
assert a.ndim == 2
assert i.ndim <= 2
if i.ndim == 1:
j = np.arange(a.shape[0])
else: # i.ndim == 2
j = np.repeat(np.arange(a.shape[0]), i.shape[1])
j.shape = i.shape
j *= a.shape[1]
j += i
return a.flat[j]
>>> a = np.arange(20)
>>> a.shape = 5, 4
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])
>>> i = np.array([[2, 1],
[3, 1],
[1, 1],
[0, 0],
[3, 1]])
>>> timeit rowtake(a, i)
100000 loops, best of 3: 14.7 us per loop
>>> timeit rowtake_cy(a, i)
100000 loops, best of 3: 10.6 us per loop
So now it's almost as fast as the element-by-element Cython version.
On 2010-08-06 03:29, josef.pktd at gmail.com wrote:
> I still find broadcasting easier to read, even if it might be a bit slower
>
>>>> a[np.arange(5)[:,None], i]
> array([[ 2, 1],
> [ 7, 5],
> [ 9, 9],
> [12, 12],
> [19, 17]])
Josef, I'd forgotten you could use None to increase the dimensionality of an
array. Neat. And, somehow, it's almost twice as fast as the Cython version!:
>>> timeit a[np.arange(a.shape[0])[:, None], i]
100000 loops, best of 3: 5.76 us per loop
I like it. Thanks for all the help!
Martin
More information about the NumPy-Discussion
mailing list