[SciPy-User] How to fit parameters of beta distribution?

John Reid j.reid at mail.cryst.bbk.ac.uk
Sat Jun 25 06:13:09 EDT 2011


On 24/06/11 14:32, josef.pktd at gmail.com wrote:
> On Fri, Jun 24, 2011 at 9:09 AM, John Reid<j.reid at mail.cryst.bbk.ac.uk>  wrote:
>>
>>
>> On 24/06/11 13:58, josef.pktd at gmail.com wrote:
>>> On Fri, Jun 24, 2011 at 8:37 AM, John Reid<j.reid at mail.cryst.bbk.ac.uk>    wrote:
>>>> Thanks for the information. Just out of interest, this is what I get on
>>>> scipy 0.7 (no warnings)
>>>>
>>>> In [1]: import scipy.stats
>>>>
>>>> In [2]: scipy.stats.beta.fit([.5])
>>>> Out[2]:
>>>> array([  1.87795851e+00,   1.81444871e-01,   2.39026963e-04,
>>>>            4.99760973e-01])
>>>>
>>>> In [3]: scipy.__version__
>>>> Out[3]: '0.7.0'
>>>>
>>>> Also I have (following your advice):
>>>>
>>>> In [7]: scipy.stats.beta.fit([.5], floc=0., fscale=1.)
>>>> Out[7]:
>>>> array([  1.87795851e+00,   1.81444871e-01,   2.39026963e-04,
>>>>            4.99760973e-01])
>>>>
>>>> which just seems wrong, surely the loc and scale in the output should be
>>>> what I specified in the arguments? In any case from your example, it
>>>> seems like it is fixed in 0.9
>>>
>>> floc an fscale where added in scipy 0.9, extra keywords on 0.7 were just ignored
>>
>> OK
>>
>>>
>>>>
>>>> I'm assuming fit() does a ML estimate of the parameters which I think is
>>>> fine to do for a beta distribution and one data point.
>>>
>>> You need at least as many observations as parameters, and without
>>> enough observations the estimate will be very noisy. With fewer
>>> observations than parameters, you cannot identify the parameters.
>>
>> I'm not quite sure what you mean by "identify". It is a ML estimate
>> isn't it? That seems legitimate here but it wasn't really my original
>> question. I was just using [.5] as an example.
>
> simplest example: fit a linear regression line through one point.
> There are an infinite number of solutions, that all fit the point
> exactly. So we cannot estimate constant and slope, but if we fix one,
> we can estimate the other parameter.
Agreed, although a linear regression is not a beta distribution.

>
> Or, in Christoph's example below you just get a mass point, degenerate
> solution, in other cases the Hessian will be singular.
>

I agree that a ML estimate of a Gaussian's variance makes little sense 
from one data point. In the case of a beta distribution, the ML estimate 
is more useful. I would prefer a Bayesian approach with a prior and full 
posterior but that could lead to another debate. But anyway I'm not 
trying to estimate the parameters from one data point, it was just an 
example.

John.




More information about the SciPy-User mailing list