[Numpy-discussion] Problem migrating PDL's index() into NumPy

Wed Mar 17 10:01:48 EDT 2010

On Wed, Mar 17, 2010 at 9:36 AM, Miroslav Sedivy
<miroslav.sedivy at weather-consult.com> wrote:
> josef.pktd at gmail.com wrote:
>> On Wed, Mar 17, 2010 at 7:12 AM, Miroslav Sedivy wrote:
>>> There are two 2D arrays with dimensions: A[10000,1000] and B[10000,100].
>>> The first dimension of both arrays corresponds to a list of 10000 objects.
>>>
>>> The array A contains for each of 10000 objects 1000 integer values
>>> between 0 and 99, so that for each of 10000 objects a corresponding
>>> value can be found in the array B.
>>>
>>> I need a new array C[10000,1000] with values from B the following way:
>>>
>>> for x in range(10000):
>>>    for y in range(1000):
>>>       C[x,y] = B[x,A[x,y]]
>>>
>>> In Perl's PDL, this can be done with $C = $B->index($A)
>>>
>>> If in NumPy I do C = B[A], then I do not get a [10000,1000] 2D array,
>>> but rather a [10000,1000,1000] 3D array, in which I can find the correct
>>> values on the following positions:
>>>
>>> for x in range(10000):
>>>    for y in range(1000):
>>>       C[x,y,y]
>>>
>>> which may seem nice, but it needs 1000 times more memory and very
>>> probably 1000 times more time to calculate... Impossible with such large
>>> arrays... :-(
>>>
>>> Could anyone help me, please?
>>
>> try
>> C = B[:,A]
>> or
>> C = B[np.arange(1000)[:,None], A]
>>
>> I think, one of the two (or both) should work (but no time for trying it myself)
>> Josef
>
>
> Thank you, Josef, for responding.
>
> None of them works correctly. The first one works only as B.T[:,A] and
> gives me the same _3D_ array as B[A].T
>
> The second one tells me: ValueError: shape mismatch: objects cannot be
> broadcast to a single shape

because you have 10000 rows not 1000 as in the example I typed
Index arrays are broadcasted so they have to have matching shapes


>>> n0 = 5  # number of rows
>>> B = np.ones((n0,3))*np.arange(3)
>>> A = np.random.randint(3,size=(n0,3))
>>> C = B[np.arange(n0)[:,None],A]
>>> assert (A == C).all()
>>> A
array([[2, 0, 1],
       [2, 0, 1],
       [2, 1, 2],
       [0, 0, 2],
       [2, 0, 0]])
>>> C
array([[ 2.,  0.,  1.],
       [ 2.,  0.,  1.],
       [ 2.,  1.,  2.],
       [ 0.,  0.,  2.],
       [ 2.,  0.,  0.]])

Josef

>
> Now I am using an iteration over all 10000 elements:
>
> C = np.empty_like(A)
> for i in range(10000):
>    C[:,i] = B[:,i][A[:,i]]
>
> which works perfectly. Just it is a real pain seeing such a for-loop in
> the NumPy-World :-(
>
> Thanks,
> Miroslav
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>