[SciPy-dev] Problem with F distribution, or with me?

Wed Aug 13 01:19:34 EDT 2008

I wanted to compare the distributions in numpy.random with
scipy.stats.distribution.
When I found the kolmogorov_test in test_distributions.py, I was wondering
why this
test did not find the bug in the numpy random number generator.

It seems that this test is much too weak, sample size = 30 and parameters
between 1 and 2.

After I made the test stricter, increased the power, I get the
rejection/test failure for the F-distribution,
but additionally I get 2 to 4 additional failures, in fatiguelife, loggamma
in all runs and in genhalflogistic, and genextreme
only sometimes. Test result of an example run are below. I did not see any
obvious problem with my change in the test, the parameters that are used in
the tests are not ruled out from what I have seen in the doc strings or a
quick google search,
and I don't know these distributions at all or not well enough, to tell
whether there is anything wrong with these distributions
or with the tests.

Josef

I'm using
>>> numpy.version.version
'1.1.0'
>>> scipy.version.version
'0.6.0'

Failures with changed test_distributions.py
===============================

>>>
execfile(r'C:\Programs\Python24\Lib\site-packages\scipy\stats\tests\test_distributions.py')
  Found 73/73 tests for stats.tests.test_distributions
  Found 10/10 tests for stats.tests.test_morestats
  Found 107/107 tests for stats.tests.test_stats
...................FF......F.F...............F.............................Ties
preclude use of exact statistic.
..Ties preclude use of exact statistic.
.................................................................................................................
======================================================================
FAIL: check_cdf (stats.tests.test_distributions.test_f)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<string>", line 9, in check_cdf
AssertionError: D = 0.493585929987; pval = 0.0; alpha = 0.01
args = (9.8771486774554127, 1.2819774801876884)

======================================================================
FAIL: check_cdf (stats.tests.test_distributions.test_fatiguelife)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<string>", line 9, in check_cdf
AssertionError: D = 0.101323526498; pval = 0.0; alpha = 0.01
args = (3.3139748541207283,)

======================================================================
FAIL: check_cdf (stats.tests.test_distributions.test_genextreme)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<string>", line 9, in check_cdf
AssertionError: D = 0.02902; pval = 0.0; alpha = 0.01
args = (10.616290590132825,)

======================================================================
FAIL: check_cdf (stats.tests.test_distributions.test_genhalflogistic)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<string>", line 9, in check_cdf
AssertionError: D = 0.02343; pval = 0.0; alpha = 0.01
args = (8.4724627096253382,)

======================================================================
FAIL: check_cdf (stats.tests.test_distributions.test_loggamma)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<string>", line 9, in check_cdf
AssertionError: D = 1.0; pval = 0.0; alpha = 0.01
args = (4.4259066194420793,)

----------------------------------------------------------------------
Ran 190 tests in 5.250s

FAILED (failures=5)
>>>

3 Changes I made to scipy\stats\tests\test_distributions.py
===========================================
 * increase spread for random parameters *10
 * increase sample size N

Note: this is from scipy 0.60, but the same parameters are used in the
current trunk

{{{
for dist in dists:
    distfunc = eval('stats.'+dist)
    nargs = distfunc.numargs
    alpha = 0.01
    if dist == 'fatiguelife':
        alpha = 0.001
    if dist == 'erlang':
        args = str((4,)+tuple(rand(2)))
    elif dist == 'frechet':
        args = str(tuple(2*rand(1))+(0,)+tuple(2*rand(2)))
    elif dist == 'triang':
        args = str(tuple(rand(nargs)))
    elif dist == 'reciprocal':
        vals = rand(nargs)
        vals[1] = vals[0] + 1.0
        args = str(tuple(vals))
    else:
        args = str(tuple(1.0+rand(nargs)*10))        # old was without *10
    exstr = r"""
class test_%s(NumpyTestCase):
    def check_cdf(self):
        D,pval = stats.kstest('%s','',args=%s,N=10000)       # old was N=30
        if (pval < %f):
            D,pval = stats.kstest('%s','',args=%s,N=100000)      # old was
N=30
            #if (pval < %f):
            #    D,pval = stats.kstest('%s','',args=%s,N=30)
        assert (pval > %f), "D = " + str(D) + "; pval = " + str(pval) + ";
alpha = " + str(alpha) + "\nargs = " + str(%s)
""" % (dist,dist,args,alpha,dist,args,alpha,dist,args,alpha,args)
    exec exstr
}}}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20080813/7e33dc77/attachment.html>