[SciPy-User] Questions/comments about scipy.stats.mannwhitneyu

Fri Feb 15 11:35:03 EST 2013

On Fri, Feb 15, 2013 at 11:16 AM,  <josef.pktd at gmail.com> wrote:
> On Thu, Feb 14, 2013 at 7:06 PM, Chris Rodgers <xrodgers at gmail.com> wrote:
>> Hi all
>>
>> I use scipy.stats.mannwhitneyu extensively because my data is not at
>> all normal. I have run into a few "gotchas" with this function and I
>> wanted to discuss possible workarounds with the list.
>
> Can you open a ticket ? http://projects.scipy.org/scipy/report
>
> I partially agree, but any changes won't be backwards compatible, and
> I don't have time to think about this enough.
>
>>
>> 1) When this function returns a significant result, it is non-trivial
>> to determine the direction of the effect! The Mann-Whitney test is NOT
>> a test on difference of medians or means, so you cannot determine the
>> direction from these statistics. Wikipedia has a good example of why
>> it is not a test for difference of median.
>> http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U#Illustration_of_object_of_test
>>
>> I've reprinted it here. The data are the finishing order of hares and
>> tortoises. Obviously this is contrived but it indicates the problem.
>> First the setup:
>> results_l = 'H H H H H H H H H T T T T T T T T T T H H H H H H H H H H
>> T T T T T T T T T'.split(' ')
>> h = [i for i in range(len(results_l)) if results_l[i] == 'H']
>> t = [i for i in range(len(results_l)) if results_l[i] == 'T']
>>
>> And the results:
>> In [12]: scipy.stats.mannwhitneyu(h, t)
>> Out[12]: (100.0, 0.0097565768849708391)
>>
>> In [13]: np.median(h), np.median(t)
>> Out[13]: (19.0, 18.0)
>>
>> Hares are significantly faster than tortoises, but we cannot determine
>> this from the output of mannwhitneyu. This could be fixed by either
>> returning u1 and u2 from the guts of the function, or testing them in
>> the function and returning the comparison. My current workaround is
>> testing the means which is absolutely wrong in theory but usually
>> correct in practice.
>
> In some cases I'm reluctant to return the direction when we use a
> two-sided test. In this case we don't have a one sided tests.
> In analogy to ttests, I think we could return the individual u1, u2

to expand a bit:
For the Kolmogorov Smirnov test, we refused to return an indication of
the direction. The alternative is two-sided and the distribution of
the test statististic and the test statistic are different in the
one-sided test.
So we shouldn't draw any one-sided conclusions from the two-sided test.

In the t_test and mannwhitenyu the test statistic is normally
distributed (in large samples), so we can infer the one-sided test
from the two-sided statistic and p-value.

If there are tables for the small sample case, we would need to check
if we get consistent interpretation between one- and two-sided tests.

Josef

>
>>
>> 2) The documentation states that the sample sizes must be at least 20.
>> I think this is because the normal approximation for U is not valid
>> for smaller sample sizes. Is there a table of critical values for U in
>> scipy.stats that is appropriate for small sample sizes or should the
>> user implement his or her own?
>
> not available in scipy. I never looked at this.
> pull requests for this are welcome if it works. It would be backwards
> compatible.
>
>>
>> 3) This is picky but is there a reason that it returns a one-tailed
>> p-value, while other tests (eg ttest_*) default to two-tailed?
>
> legacy wart, that I don't like,  but it wasn't offending me enough to change it.
>
>>
>>
>> Thanks for any thoughts, tips, or corrections and please don't take
>> these comments as criticisms ... if I didn't enjoy using scipy.stats
>> so much I wouldn't bother bringing this up!
>
> Thanks for the feedback.
> In large parts review of the functions relies on comments by users
> (and future contributors).
>
> The main problem is how to make changes without breaking current
> usage, since many of those functions are widely used.
>
> Josef
>
>
>>
>> Chris
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user