[Numpy-discussion] nonuniform scatter operations

Sun Sep 28 16:15:30 EDT 2008

On Sat, Sep 27, 2008 at 10:01 PM, Nathan Bell <wnbell at gmail.com> wrote:
> On Sun, Sep 28, 2008 at 12:34 AM, Geoffrey Irving <irving at naml.us> wrote:
>>
>> Is there an efficient way to implement a nonuniform gather operation
>> in numpy?  Specifically, I want to do something like
>>
>> n,m = 100,1000
>> X = random.uniform(size=n)
>> K = random.randint(n, size=m)
>> Y = random.uniform(size=m)
>>
>> for k,y in zip(K,Y):
>>    X[k] += y
>>
>> but I want it to be fast.  The naive attempt "X[K] += Y" does not
>> work, since the slice assumes the indices don't repeat.
>>
>
> I don't know of  numpy solution, but in scipy you could use a sparse
> matrix to perform the operation.  I think the following does what you
> want.
>
> from scipy.sparse import coo_matrix
> X += coo_matrix( (Y, (K,zeros(m,dtype=int)), shape=(n,1)).sum(axis=1)
>
> This reduces to a simple C++ loop, so speed should be good:
> http://projects.scipy.org/scipy/scipy/browser/trunk/scipy/sparse/sparsetools/coo.h#L139

Thanks.  That works great.  A slightly cleaner version is

    X += coo_matrix((Y, (K, zeros_like(K)))).sum(axis=1)

The next question is: is there a similar way that generalizes to the
case where X is n by 3 and Y is m by 3 (besides the obvious loop over
range(3), that is)?

Geoffrey