[SciPy-User] nan's in stats.spearmanr

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Apr 4 18:34:01 EDT 2012


On Wed, Apr 4, 2012 at 3:54 PM, Ben <benwhalley at gmail.com> wrote:
> Apologies if this seems obvious to others, but I'm using both functions from
> pandas and stats.spearmanr in different bits of my code and noticed something
> odd.  Is the following output expected?
>
> from  pandas import DataFrame
> from scipy import stats
> a = [1, nan, 2]
> b = [1, 2, 2]
> df = DataFrame(zip(a,b))
> stats.spearmanr(a,b)
>
> gives: (0.86602540378443871, 0.3333333333333332)
>
> df.corr(method="spearman")
>   0  1
> 0  1  1
> 1  1  1
>
> Removing the nan from a produces identical results. I had expected the first
> output, but perhaps I'm not  understanding how scipy likes to handle nan.

scipy.stats doesn't handle nans in most cases, they are just ignored
(what the outcome is depends on the implementation details)

the correct answer should be in stats.mstats, which uses masked arrays
to handle nan cases

>>> am = np.ma.fix_invalid(a)
>>> bm = np.ma.fix_invalid(b)
>>> stats.mstats.spearmanr(am, bm)
(1.0, 0.0)

Josef


>
> Any advice much appreciated.
>
> Regards,
>
> Ben
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user



More information about the SciPy-User mailing list