[SciPy-User] reversing or avoiding the need to reverse argsort()

Sat Mar 20 12:59:21 EDT 2010

On 20 March 2010 12:41, Vincent Davis <vincent at vincentdavis.net> wrote:

> @Anne Archibald
> I get an error and not sure way, or I should say first I need to better
> understand the indexing then I might be able to fix it.

Ah, oops. I got the indexing wrong. numpy's broadcasting needs a little
help:

AA[I,np.arange(A.shape[1])] =
np.mean(A[I,np.arange(A.shape[1])],axis=1)[:,np.newaxis]

What's going on in this broadcasting is several things.

First, when you argsort a high-dimensional array along one axis, you get an
array the same shape as what you started with, containing the indices along
that axis. But to use fancy indexing on a two-dimensional array, the way it
works is:

A[B, C]

That is, B and C should be arrays the same shape as each other (but not
necessarily the same shape as A). The result will have the same shape as B
and C; what you get is (say B and C are three-dimensional):

(A[B, C]) [i,j,k] = A[B[i,j,k], C[i,j,k]]

So in your case, the argsorted array I is exactly what should go in the
first place. What you really want is:

B[i,j] = A[np.argsort(A,axis=0)[i,j], j]

So you want to make up an array for the second index whose (i,j)th entry is
just j. But I supplied np.arange(A.shape[1]); this is only a one-dimensional
array. Rather than complain that the index arrays don't match up with each
other, numpy tries adding a "fake" axis to the beginning, so that it looks
like a two-dimensional array M that just ignores its first index:  M[i,j] =
np.arange(A.shape[1])[j] =
Which is what you want.

All this was correct in the original email. What I got wrong was the mean:
taking the mean across each row leaves you with a one-dimensional array,
which we try to stuff into a two-dimensional array. So numpy tries its trick
of adding an axis at the beginning, only to find out that that doesn't work.
We needed to add an axis at the end, which numpy won't automatically do for
us. So we do it by using np.newaxis.

Anne

In [3]: def quantile_normalization(anarray):
>    ...:             """ anarray with samples in the columns and probes
> across the rows"""
>    ...:         A=anarray
>    ...:         AA = np.zeros_like(A)
>    ...:         I = np.argsort(A,axis=0)
>    ...:         AA[I,np.arange(A.shape[1])] =
> np.mean(A[I,np.arange(A.shape[1])],axis=1)
>    ...:         return AA
>    ...:
>
> In [4]: x = np.array([[1,2,3,4],[4,3,2,1],[1,1,1,1]])
>
> In [5]: quantile_normalization(x)
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
>
> /Users/vmd/<ipython console> in <module>()
>
> /Users/vmd/<ipython console> in quantile_normalization(anarray)
>
> ValueError: array is not broadcastable to correct shape
>
> In [6]:
>
>   *Vincent Davis
> 720-301-3003 *
> vincent at vincentdavis.net
>  my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis>
>
>
> On Sat, Mar 20, 2010 at 10:09 AM, Anne Archibald <
> peridot.faceted at gmail.com> wrote:
>
>>
>>
>> On 20 March 2010 11:37, Vincent Davis <vincent at vincentdavis.net> wrote:
>>
>>> What I want to do is simple but not sure the best way to go about it.
>>>
>>> I have an array a, shape(a) = rXc I need to sort each column
>>> aargsort = a.argsort(axis=0)  # May use this later
>>> aSort = a.sort(axis=0)
>>>
>>> now average each row
>>> aSortRM = asort.mean(axis=1)
>>>
>>> now replace each col in a row with the row mean.
>>> is there a better way than this
>>>
>>> aWithMeans = ones_like(a)
>>> for ind in range(r)  # r = number of rows
>>>     aWithMeans[ind]* aSortRM[ind]
>>>
>>> Now I need to undo the sort I did in the first step.
>>> I think this is where I need the aargsort (argsort of a before any
>>> modification) I am kinda stumped on this step. Everything I think of is
>>> rather convoluted.
>>> Is it possible to do this whole process without actually rearranging the
>>> original array and just using the index info from the argsort() and would I
>>> want to do this?
>>>
>>
>> First of all, there's no need to use both argsort and sort:
>>
>> In [2]: A = np.random.randn(1000)
>>
>> In [3]: I = A.argsort()
>>
>> In [4]: B = A[I]
>>
>> In [5]: np.all(np.diff(B)>=0)
>> Out[5]: True
>>
>> Now, to invert the permutation I, the easiest way is just to argsort() it;
>> but this is N log N instead of order N, so there's another, slightly more
>> cumbersome, way to do it:
>>
>> In [12]: Ij = np.zeros_like(I)
>>
>> In [13]: Ij[I] = np.arange(len(I))
>>
>> In [23]: AA = B[Ij]
>>
>> In [24]: np.all(AA==A)
>> Out[24]: True
>>
>> Alternatively, you can avoid ever creating the inverse permutation:
>>
>> In [25]: AA[I] = B
>>
>> In [26]: np.all(AA==A)
>> Out[26]: True
>>
>> All of this is somewhat more painful with two-dimensional arrays, since
>> fancy and normal indexing do not mix well, and argsort returns something
>> peuliar, but it can be made to work:
>>
>> In [51]: A = np.random.randn(100,100)
>>
>> In [52]: I = np.argsort(A, axis=0)
>>
>> In [54]: B=A[I,np.arange(A.shape[1])]
>>
>> In [55]: AA = np.zeros_like(A)
>>
>> In [56]: AA[I,np.arange(A.shape[1])] = B
>>
>> In [57]: np.all(AA==A)
>> Out[57]: True
>>
>> Now for processing B in place, that's a simpler problem:
>>
>> In [58]: BB = np.zeros_like(B)
>>
>> In [59]: BB[...] = np.mean(B,axis=1)
>>
>> Here numpy's broadcasting rules take care of duplicating the entries of B
>> as needed to fill out BB. In fact we can dispense with BB entirely:
>>
>> In [60]: AA[I,np.arange(A.shape[1])] = np.mean(B,axis=1)
>>
>> So your code can be rewritten as the impenetrable three lines:
>>
>> In [61]: AA = np.zeros_like(A)
>>
>> In [62]: I = np.argsort(A,axis=0)
>>
>> In [63]: AA[I,np.arange(A.shape[1])] =
>> np.mean(A[I,np.arange(A.shape[1])],axis=1)
>>
>> Or, even less clearly, making I into what I think argsort should probably
>> return:
>>
>> In [64]: I = np.argsort(A,axis=0), np.arange(A.shape[1])
>>
>> In [65]: AA[I] = np.mean(A[I],axis=1)
>>
>>
>> Anne
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100320/551e0e01/attachment.html>