[SciPy-User] max likelihood

Mon Jun 21 18:51:21 EDT 2010

On Mon, Jun 21, 2010 at 6:17 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Mon, Jun 21, 2010 at 5:55 PM, David Goldsmith
> <d.l.goldsmith at gmail.com> wrote:
>> On Mon, Jun 21, 2010 at 2:43 PM, Skipper Seabold <jsseabold at gmail.com>
>> wrote:
>>>
>>> On Mon, Jun 21, 2010 at 5:34 PM, David Goldsmith
>>> <d.l.goldsmith at gmail.com> wrote:
>>> > On Mon, Jun 21, 2010 at 2:17 PM, eneide.odissea
>>> > <eneide.odissea at gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi All
>>> >> I had a look at the scipy.stats documentation and I was not able to
>>> >> find a
>>> >> function for
>>> >> maximum likelihood parameter estimation.
>>> >> Do you know whether is available in some other namespace/library of
>>> >> scipy?
>>> >> I found on the web few libraries ( this one is an
>>> >> example http://bmnh.org/~pf/p4.html ) having it,
>>> >> but I would prefer to start playing with what scipy already offers by
>>> >> default ( if any ).
>>> >> Kind Regards
>>> >> eo
>>> >
>>> > scipy.stats.distributions.rv_continuous.fit (I was just working on the
>>> > docstring for that; I don't believe my changes have been merged; I
>>> > believe
>>> > Travis recently updated its code...)
>>> >
>>>
>>> This is for fitting the parameters of a distribution via maximum
>>> likelihood given that the DGP is the underlying distribution.  I don't
>>> think it is intended for more complicated likelihood functions (where
>>> Nelder-Mead might fail).  And in any event it will only find the
>>> parameters of the distribution rather than the parameters of some
>>> underlying model, if this is what you're after.
>>>
>>> Skipper
>>
>> OK, but just for clarity in my own mind: are you saying that
>> rv_continuous.fit is _definitely_ inappropriate/inadequate for OP's needs
>> (i.e., am I _completely_ misunderstanding the relationship between the
>> function and OP's stated needs), or are you saying that the function _may_
>> not be general/robust enough for OP's stated needs?
>
> Well, I guess it depends on exactly what kind of likelihood function
> is being optimized.  That's why I asked.
>
> My experience with stats.distributions is all of about a week, so I
> could be wrong. But here it goes... rv_continuous is not intended to
> be used on its own but rather as the base class for any distribution.
> So if you believe that your data came from say an Gaussian
> distribution, then you could use norm.fit(data) (with other options as
> needed) to get back estimates of scale and location.  So
>
> In [31]: from scipy.stats import norm
>
> In [32]: import numpy as np
>
> In [33]: x = np.random.normal(loc=0,scale=1,size=1000)
>
> In [34]: norm.fit(x)
> Out[34]: (-0.043364692830314848, 1.0205901804210851)
>
> Which is close to our given location and scale.
>
> But if you had in mind some kind of data generating process for your
> model based on some other observed data and you were interested in the
> marginal effects of changes in the observed data on the outcome, then
> it would be cumbersome I think to use the fit in distributions. It may
> not be possible.   Also, as mentioned, fit only uses Nelder-Mead
> (optimize.fmin with the default parameters, which I've found to be
> inadequate for even fairly basic likelihood based models), so it may
> not be robust enough.  At the moment, I can't think of a way to fit a
> parameterized model as fit is written now.  Come to think of it though
> I don't think it would be much work to extend the fit method to work
> for something like a linear regression model.

rephrasing this a bit and adding some comments:

the fit of the distributions, estimate the parameters, shapes, loc and
scale directly, while often we want the distribution parameters,
especially loc (or mean) to depend on some explanatory variables.

Generalized Linear Models does this for the exponential family of distributions.

R has a package where any distribution parameter can be parameterized
as a (linear) function of some explanatory variables. This would not
be too difficult to implement, but I'm not sure how well established
the theory and algorithms is outside of the exponential family and
some specific distributions. Also in many cases it will not be obvious
that the likelihood function is well (enough) behaved.

I looked at the case for the t distribution, because I want it for
GARCH, but even there it is not completely clear whether the
parameterization of the t distribution should use the standard
t-distribution or the standardized t-distribution (scale=var=1).

It would be easy to do a quick job, but more time consuming to get it
to work correctly for many cases/distributions.

Josef
BTW: I haven't touched fit in stats.distributions in a long time, the
new version is all Travis'

>
> Skipper
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>