[SciPy-user] help with scipy.stats.mannwhitneyu

josef.pktd at gmail.com josef.pktd at gmail.com
Thu Feb 5 19:03:34 EST 2009


On Thu, Feb 5, 2009 at 3:54 PM,  <josef.pktd at gmail.com> wrote:
>>
>> sample size 20, 9 ties
>> this is with R wilcox.exact, ranksums is your ranksum
> ...
>>
>> With this correction, the normal distribution based p-value in
>> ranksums looks exactly the same as stats.mannwhitneyu.
>
> this statement is not correct.
>
> I mixed up my variables and didn't actually have ties, now with ties,
> I still get essentially but not exactly the same results.
>

I think there is a mistake in the tie handling of stats.mannwhitneyu
In the calculation of the standard error the sqrt is taken twice.

    T = np.sqrt(tiecorrect(ranked))  # correction factor for tied scores
    if T == 0:
        raise ValueError, 'All numbers are identical in amannwhitneyu'
    sd = np.sqrt(T*n1*n2*(n1+n2+1)/12.0)

I don't have the formulas for the tie correction, but from looking at
the tie correction
in Sturlas version of ranksums, it seems that the first sqrt shouldn't be there.

Can someone with access to the correct references verify this.

Josef



More information about the SciPy-User mailing list