[Numpy-discussion] Bug in the F distribution?

Mon Jul 6 17:38:32 EDT 2009

On Fri, Jul 3, 2009 at 10:21 PM, Alan Jackson<alan at ajackson.org> wrote:
> I've tried the same scheme using R and it seems to give the right
> answers
>
>> quantile( rf(10000000,10,10), .99)
>    99%
> 4.84548
>> quantile( rf(10000000,11,10), .99)
>     99%
> 4.770002
>> quantile( rf(10000000,11,11), .99)
>     99%
> 4.465655
>> quantile( rf(10000000,10,11), .99)
>     99%
> 4.539423
>
>
>>> I either found a bug in the F distribution, or I'm really messed up.
>>>
>>> From a table I find
>>>
>>> dfnum  dfden  F(P<.01)
>>> 10      10     4.85
>>> 11      10     4.78
>>> 11      11     4.46
>>> 10      11     4.54
>>>
>>> So let's calculate the same quantities using numpy...
>>>
>>> import scipy.stats as stats
>>> import numpy as np
>>> In [89]: stats.scoreatpercentile(np.random.f(10,10,1000000), 99)
>>> Out[89]: 4.8575912131878365
>>> In [90]: stats.scoreatpercentile(np.random.f(11,10,1000000), 99)
>>> Out[90]: 5.2721528315236501
>>> In [91]: stats.scoreatpercentile(np.random.f(11,11,1000000), 99)
>>> Out[91]: 4.4695161332631841
>>> In [92]: stats.scoreatpercentile(np.random.f(10,11,1000000), 99)
>>> Out[92]: 4.1229323443042674
>>>
>>>
>>> So at 10,10 and 11,11 it works (maybe), but all the other values are clearly
>>> off. I tried re-running the example I put into the documentation last summer,
>>> which worked, and I don't get the right answer any longer.
>
>
> --
> -----------------------------------------------------------------------
> | Alan K. Jackson            | To see a World in a Grain of Sand      |
> | alan at ajackson.org          | And a Heaven in a Wild Flower,         |
> | www.ajackson.org           | Hold Infinity in the palm of your hand |
> | Houston, Texas             | And Eternity in an hour. - Blake       |
> -----------------------------------------------------------------------
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

I don't see any problem here. If you can replicate your results, we
would need more information about the versions.

Josef

'''
>>> np.version.version
'1.3.0'
>>> scipy.version.version
'0.8.0.dev5789'
'''

import numpy as np
from scipy import stats

print stats.scoreatpercentile(np.random.f(10,10,1000000), 99)
print stats.scoreatpercentile(np.random.f(11,10,1000000), 99)
print stats.scoreatpercentile(np.random.f(11,11,1000000), 99)
print stats.scoreatpercentile(np.random.f(10,11,1000000), 99)
print stats.f.ppf(0.99,[10,11,11,10],[10,10,11,11])

'''
output of 3 runs
4.8481204521
4.76654256549
4.463383254
4.52687574568
[ 4.8491468   4.77151806  4.46243604  4.53928181]
>>>
4.84318380095
4.76241120816
4.45605908509
4.52990235306
[ 4.8491468   4.77151806  4.46243604  4.53928181]
>>>
4.82112829219
4.77673133327
4.46580427304
4.54572040722
[ 4.8491468   4.77151806  4.46243604  4.53928181]
'''

for _ in range(10):
    rvsf = np.random.f(11,10,100000)
    print np.sum(rvsf>4.77151806)/100000.

'''
output of loop
0.0098
0.01016
0.01045
0.00995
0.00999
0.01029
0.01007
0.00973
0.00964
0.01006
'''