[SciPy-dev] Problem with F distribution, or with me?

Robert Kern robert.kern at gmail.com
Sun Aug 24 16:39:50 EDT 2008


On Sun, Aug 24, 2008 at 12:49,  <josef.pktd at gmail.com> wrote:
>>On Wed, Aug 13, 2008 at 11:44,  <josef.pktd at gmail.com> wrote:
>>> I think, stats.loggamma.rvs is wrong or uses a definition that I cannot figure out.
>>
>>It isn't related to log(gamma.rvs()). It is the same distribution as the "standard" version of lgammaff in VGAM:
>>
>>  http://rss.acs.unt.edu/Rdoc/library/VGAM/html/lgammaff.html
>>
>>--
>>Robert Kern
>
> Hi,
>
> I still think there is a problem with the loggamma distribution. I am
> attaching a script that compares the random variables generated with
> scipy.stats.loggamma.rvs  with the theoretical distribution from
> scipy.stats.loggamma.pdf and the explicit formula, which is the same
> in scipy.stats.loggamma.pdf  as in the
> http://rss.acs.unt.edu/Rdoc/library/VGAM/html/lgammaff.html. The
> script produces many graphs for the range of parameters that seem
> reasonable to me.
>
> >From the histograms you can see that the fit of the sample to the
> correct pdf is very weak and seems to hold only for some parameter
> values, e.g. c=1, c=2. For c=1.5 or 1.6 which is in the range of the
> kstest in the scipy tests, the fit does not look very good.
>
> On the other hand, the log of a gamma random variable has a good fit
> to the theoretical distribution in scipy.stats.loggamma.pdf.

I believe you are correct. The implementation of the CDF and PPF for
this distribution appears to have numerical problems. The default
implementation of the RVS, uses the PPF to invert U(0,1) random
numbers and messes up significantly. Putting in log(mtrand.gamma(c))
for the RVS appears to match the PDF, but not the CDF or PPF.

<some more fiddling>

Ah. All of the references refer to the CDF as the ratio of the
incomplete gamma function gammainc(c,exp(x)) divided by gamma(c). This
was translated to special.gammainc(c,exp(x))/special.gamma(c).
*However*, it appears that Cephes' gammainc() function does this whole
ratio, not just the numerator. Removing the extraneous gamma()s gives
us stable results across a wide range of shape parameters with or
without log(mtrand.gamma(c)) (which we will use).

Thank you for your attention to these issues. It's greatly appreciated.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco



More information about the SciPy-Dev mailing list