[SciPy-user] help with scipy.stats.mannwhitneyu

Thu Feb 5 19:03:34 EST 2009

On Thu, Feb 5, 2009 at 3:54 PM,  <josef.pktd at gmail.com> wrote:
>>
>> sample size 20, 9 ties
>> this is with R wilcox.exact, ranksums is your ranksum
> ...
>>
>> With this correction, the normal distribution based p-value in
>> ranksums looks exactly the same as stats.mannwhitneyu.
>
> this statement is not correct.
>
> I mixed up my variables and didn't actually have ties, now with ties,
> I still get essentially but not exactly the same results.
>

I think there is a mistake in the tie handling of stats.mannwhitneyu
In the calculation of the standard error the sqrt is taken twice.

    T = np.sqrt(tiecorrect(ranked))  # correction factor for tied scores
    if T == 0:
        raise ValueError, 'All numbers are identical in amannwhitneyu'
    sd = np.sqrt(T*n1*n2*(n1+n2+1)/12.0)

I don't have the formulas for the tie correction, but from looking at
the tie correction
in Sturlas version of ranksums, it seems that the first sqrt shouldn't be there.

Can someone with access to the correct references verify this.

Josef