[Numpy-discussion] Problem migrating PDL's index() into NumPy

Wed Mar 17 09:36:59 EDT 2010

josef.pktd at gmail.com wrote:
> On Wed, Mar 17, 2010 at 7:12 AM, Miroslav Sedivy wrote:
>> There are two 2D arrays with dimensions: A[10000,1000] and B[10000,100].
>> The first dimension of both arrays corresponds to a list of 10000 objects.
>>
>> The array A contains for each of 10000 objects 1000 integer values
>> between 0 and 99, so that for each of 10000 objects a corresponding
>> value can be found in the array B.
>>
>> I need a new array C[10000,1000] with values from B the following way:
>>
>> for x in range(10000):
>>    for y in range(1000):
>>       C[x,y] = B[x,A[x,y]]
>>
>> In Perl's PDL, this can be done with $C = $B->index($A)
>>
>> If in NumPy I do C = B[A], then I do not get a [10000,1000] 2D array,
>> but rather a [10000,1000,1000] 3D array, in which I can find the correct
>> values on the following positions:
>>
>> for x in range(10000):
>>    for y in range(1000):
>>       C[x,y,y]
>>
>> which may seem nice, but it needs 1000 times more memory and very
>> probably 1000 times more time to calculate... Impossible with such large
>> arrays... :-(
>>
>> Could anyone help me, please?
> 
> try
> C = B[:,A]
> or
> C = B[np.arange(1000)[:,None], A]
> 
> I think, one of the two (or both) should work (but no time for trying it myself)
> Josef

Thank you, Josef, for responding.

None of them works correctly. The first one works only as B.T[:,A] and 
gives me the same _3D_ array as B[A].T

The second one tells me: ValueError: shape mismatch: objects cannot be 
broadcast to a single shape

Now I am using an iteration over all 10000 elements:

C = np.empty_like(A)
for i in range(10000):
    C[:,i] = B[:,i][A[:,i]]

which works perfectly. Just it is a real pain seeing such a for-loop in 
the NumPy-World :-(

Thanks,
Miroslav