[SciPy-Dev] anyone want to fix Mann-Whitney test?

Sun Feb 5 09:49:34 EST 2012

On Sun, Feb 5, 2012 at 9:28 AM, <josef.pktd at gmail.com> wrote:
>
>
>
> On Sun, Feb 5, 2012 at 8:28 AM, Ralf Gommers <ralf.gommers at googlemail.com> wrote:
>>
>>
>>
>> On Sun, Feb 5, 2012 at 1:19 PM, <josef.pktd at gmail.com> wrote:
>>>
>>>
>>>
>>> On Sun, Feb 5, 2012 at 5:17 AM, Ralf Gommers <ralf.gommers at googlemail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> There's a bug report and a number of new tests for mannwhitneyu at http://projects.scipy.org/scipy/ticket/1593. These plus a fix were contributed by Sebastian Pölsterl, unfortunately he based his initial fix on GPL'ed R code. Therefore I think we can't use that, even after he modified it. I looked at the GPL code too; I think we need someone who didn't do that to implement a new fix based only on the tests and bug report.
>>>>
>>>> Any takers?
>>>
>>>
>>> From what I remember my impression is that this is only a "cosmetic" change, or better a change in what is returned.
>>>
>>> >>> v, pval = stats.mannwhitneyu(x, y)
>>> >>> len(x)*len(y) - v
>>> 498.0
>>
>>
>> Ah, okay. I'm not sure if this is a desirable change then. Any idea why it was implemented like this?
>
>
> No, I was just fixing bugs. This was one of the early tests I worked on when I didn't have stronger opinions what the standard or more informative returns are. Since the pvalues are correct, I didn't care too much about which test statistic is reported.
>
> Looking a bit closer, I'm in favor of the change. Returning the short tail instead of the asked for tail in a one-sided test is not really "clean", and trying to rewrite this, it's not easy to figure out which is which, 210 or 498. I haven't finished yet. I like requests with a full test suite.
>
> If I remember correctly, then we return almost all the time the two-sided test, so adding the option for one-sided test will be backwards compatible, but for mannwhitneyu it might not be possible.

rewrite as a standalone function is attached

the last test was missing a self

And I initially had a test failure, because I preferred the keyword
arguments in reversed sequence and the tests use a keyword arguments
as positional argument.
Also just tried to match the tests without trying to understand every
detail again.

I think it would be better if the default is two-sided but this will
double the reported p-value compared to the current version.

>
>
>
>>>
>>>
>>> >>> pval*2
>>> 9.188326533255e-05
>>>
>>>
>>> docstring says:
>>>     The reported p-value is for a one-sided hypothesis, to get the two-sided
>>>     p-value multiply the returned p-value by 2.
>>>
>>> currently I think none of the tests that uses normal or t distribution has one versus two sided option, but I think they could be added everywhere.
>>> One argument in favor of adding two one-sided options is that we return the correct tail instead of the smaller tail.
>>
>>
>> fisher_exact, kstest and ks_twosamp have less/greater/two-sided. I also think it makes sense to add them where possible.
>
>
> None of these have a symmetric test distribution, AFAI remember. So, for those it's not easy to figure out how to move from one sided short tail to two-sided or the other way around.
>
> Josef
>
>>
>>
>> Ralf
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: try_mannwhitenyu.py
Type: text/x-python
Size: 8256 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20120205/49af78ce/attachment.py>