[SciPy-User] nan's in stats.spearmanr
josef.pktd at gmail.com
josef.pktd at gmail.com
Wed Apr 4 18:34:01 EDT 2012
On Wed, Apr 4, 2012 at 3:54 PM, Ben <benwhalley at gmail.com> wrote:
> Apologies if this seems obvious to others, but I'm using both functions from
> pandas and stats.spearmanr in different bits of my code and noticed something
> odd. Is the following output expected?
>
> from pandas import DataFrame
> from scipy import stats
> a = [1, nan, 2]
> b = [1, 2, 2]
> df = DataFrame(zip(a,b))
> stats.spearmanr(a,b)
>
> gives: (0.86602540378443871, 0.3333333333333332)
>
> df.corr(method="spearman")
> 0 1
> 0 1 1
> 1 1 1
>
> Removing the nan from a produces identical results. I had expected the first
> output, but perhaps I'm not understanding how scipy likes to handle nan.
scipy.stats doesn't handle nans in most cases, they are just ignored
(what the outcome is depends on the implementation details)
the correct answer should be in stats.mstats, which uses masked arrays
to handle nan cases
>>> am = np.ma.fix_invalid(a)
>>> bm = np.ma.fix_invalid(b)
>>> stats.mstats.spearmanr(am, bm)
(1.0, 0.0)
Josef
>
> Any advice much appreciated.
>
> Regards,
>
> Ben
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
More information about the SciPy-User
mailing list