[SciPy-User] How to fit parameters of beta distribution?

Sat Jun 25 06:13:09 EDT 2011

On 24/06/11 14:32, josef.pktd at gmail.com wrote:
> On Fri, Jun 24, 2011 at 9:09 AM, John Reid<j.reid at mail.cryst.bbk.ac.uk>  wrote:
>>
>>
>> On 24/06/11 13:58, josef.pktd at gmail.com wrote:
>>> On Fri, Jun 24, 2011 at 8:37 AM, John Reid<j.reid at mail.cryst.bbk.ac.uk>    wrote:
>>>> Thanks for the information. Just out of interest, this is what I get on
>>>> scipy 0.7 (no warnings)
>>>>
>>>> In [1]: import scipy.stats
>>>>
>>>> In [2]: scipy.stats.beta.fit([.5])
>>>> Out[2]:
>>>> array([  1.87795851e+00,   1.81444871e-01,   2.39026963e-04,
>>>>            4.99760973e-01])
>>>>
>>>> In [3]: scipy.__version__
>>>> Out[3]: '0.7.0'
>>>>
>>>> Also I have (following your advice):
>>>>
>>>> In [7]: scipy.stats.beta.fit([.5], floc=0., fscale=1.)
>>>> Out[7]:
>>>> array([  1.87795851e+00,   1.81444871e-01,   2.39026963e-04,
>>>>            4.99760973e-01])
>>>>
>>>> which just seems wrong, surely the loc and scale in the output should be
>>>> what I specified in the arguments? In any case from your example, it
>>>> seems like it is fixed in 0.9
>>>
>>> floc an fscale where added in scipy 0.9, extra keywords on 0.7 were just ignored
>>
>> OK
>>
>>>
>>>>
>>>> I'm assuming fit() does a ML estimate of the parameters which I think is
>>>> fine to do for a beta distribution and one data point.
>>>
>>> You need at least as many observations as parameters, and without
>>> enough observations the estimate will be very noisy. With fewer
>>> observations than parameters, you cannot identify the parameters.
>>
>> I'm not quite sure what you mean by "identify". It is a ML estimate
>> isn't it? That seems legitimate here but it wasn't really my original
>> question. I was just using [.5] as an example.
>
> simplest example: fit a linear regression line through one point.
> There are an infinite number of solutions, that all fit the point
> exactly. So we cannot estimate constant and slope, but if we fix one,
> we can estimate the other parameter.
Agreed, although a linear regression is not a beta distribution.

>
> Or, in Christoph's example below you just get a mass point, degenerate
> solution, in other cases the Hessian will be singular.
>

I agree that a ML estimate of a Gaussian's variance makes little sense 
from one data point. In the case of a beta distribution, the ML estimate 
is more useful. I would prefer a Bayesian approach with a prior and full 
posterior but that could lead to another debate. But anyway I'm not 
trying to estimate the parameters from one data point, it was just an 
example.

John.