[SciPy-User] pandas: independent row sorting of data frame

Fabrizio Pollastri f.pollastri at inrim.it
Mon Jan 3 16:21:19 EST 2011


Wes McKinney <wesmckinn <at> gmail.com> writes:
> 
> Hi Fabrizio,
> 
> I'm not that familiar with xts but I think you need only do:
> 
> sort_xs = df.apply(np.sort, axis=1)
> sort_index = df.apply(np.argsort, axis=1)
> 
> Using the apply function with the axis argument is preferable to using
> tapply-- that function is still around to support old client code (I
> may add a deprecation warning in the future).
> 
> This will only be about as fast as the R counterpart-- it would be
> easy to write a more optimized version, though.
> 
> NB many NumPy functions work using the array interface, e.g.:
> 
> np.argsort(df, axis=1)
> 
> But np.sort isn't one of them.
> 
> HTH,
> Wes
> 

Hi Wes,

thanks for your hints, but I have some problems with sort.
Let see the folowing code.

import numpy as np
from pandas import DataFrame

df = DataFrame({'a': [1,3,1], 'b':[2,2,3], 'c':[3,1,2]})
>>> df
     a              b              c
0    1              2              3
1    3              2              1
2    1              3              2

# sort index is ok.
sort_index = df.apply(np.argsort, axis=1)
>>> sort_index
     a              b              c
0    0              1              2
1    2              1              0
2    0              2              1

# sorted df is not as expected: it is equal to df.
sorted_df = df.apply(np.sort, axis=1)
>>> sorted_df
     a              b              c
0    1              2              3
1    3              2              1
2    1              3              2


Where is the trick?


TIA,
Fabrizio




More information about the SciPy-User mailing list