[SciPy-User] max likelihood

Mon Jun 21 19:10:21 EDT 2010

On Mon, Jun 21, 2010 at 7:03 PM, David Goldsmith
<d.l.goldsmith at gmail.com> wrote:
> On Mon, Jun 21, 2010 at 3:17 PM, Skipper Seabold <jsseabold at gmail.com>
> wrote:
>>
>> On Mon, Jun 21, 2010 at 5:55 PM, David Goldsmith
>> <d.l.goldsmith at gmail.com> wrote:
>> > On Mon, Jun 21, 2010 at 2:43 PM, Skipper Seabold <jsseabold at gmail.com>
>> > wrote:
>> >>
>> >> On Mon, Jun 21, 2010 at 5:34 PM, David Goldsmith
>> >> <d.l.goldsmith at gmail.com> wrote:
>> >> > On Mon, Jun 21, 2010 at 2:17 PM, eneide.odissea
>> >> > <eneide.odissea at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi All
>> >> >> I had a look at the scipy.stats documentation and I was not able to
>> >> >> find a
>> >> >> function for
>> >> >> maximum likelihood parameter estimation.
>> >> >> Do you know whether is available in some other namespace/library of
>> >> >> scipy?
>> >> >> I found on the web few libraries ( this one is an
>> >> >> example http://bmnh.org/~pf/p4.html ) having it,
>> >> >> but I would prefer to start playing with what scipy already offers
>> >> >> by
>> >> >> default ( if any ).
>> >> >> Kind Regards
>> >> >> eo
>> >> >
>> >> > scipy.stats.distributions.rv_continuous.fit (I was just working on
>> >> > the
>> >> > docstring for that; I don't believe my changes have been merged; I
>> >> > believe
>> >> > Travis recently updated its code...)
>> >> >
>> >>
>> >> This is for fitting the parameters of a distribution via maximum
>> >> likelihood given that the DGP is the underlying distribution.  I don't
>> >> think it is intended for more complicated likelihood functions (where
>> >> Nelder-Mead might fail).  And in any event it will only find the
>> >> parameters of the distribution rather than the parameters of some
>> >> underlying model, if this is what you're after.
>> >>
>> >> Skipper
>> >
>> > OK, but just for clarity in my own mind: are you saying that
>> > rv_continuous.fit is _definitely_ inappropriate/inadequate for OP's
>> > needs
>> > (i.e., am I _completely_ misunderstanding the relationship between the
>> > function and OP's stated needs), or are you saying that the function
>> > _may_
>> > not be general/robust enough for OP's stated needs?
>>
>> Well, I guess it depends on exactly what kind of likelihood function
>> is being optimized.  That's why I asked.
>>
>> My experience with stats.distributions is all of about a week, so I
>> could be wrong. But here it goes... rv_continuous is not intended to
>> be used on its own but rather as the base class for any distribution.
>> So if you believe that your data came from say an Gaussian
>> distribution, then you could use norm.fit(data) (with other options as
>> needed) to get back estimates of scale and location.  So
>>
>> In [31]: from scipy.stats import norm
>>
>> In [32]: import numpy as np
>>
>> In [33]: x = np.random.normal(loc=0,scale=1,size=1000)
>>
>> In [34]: norm.fit(x)
>> Out[34]: (-0.043364692830314848, 1.0205901804210851)
>>
>> Which is close to our given location and scale.
>>
>> But if you had in mind some kind of data generating process for your
>> model based on some other observed data and you were interested in the
>> marginal effects of changes in the observed data on the outcome, then
>> it would be cumbersome I think to use the fit in distributions. It may
>> not be possible.   Also, as mentioned, fit only uses Nelder-Mead
>> (optimize.fmin with the default parameters, which I've found to be
>> inadequate for even fairly basic likelihood based models), so it may
>> not be robust enough.  At the moment, I can't think of a way to fit a
>> parameterized model as fit is written now.  Come to think of it though
>> I don't think it would be much work to extend the fit method to work
>> for something like a linear regression model.
>>
>> Skipper
>
>
> OK, this is all as I thought (e.g., fit only "works" to get the MLE's from
> data for a *presumed* distribution, but it is all-but-useless if the
> distribution isn't (believed to be) "known" a priori); just wanted to be
> sure I was reading you correctly. :-)  Thanks!

MLE always assumes that the distribution is known, since you need the
likelihood function.

It's not non- or semi-parametric.

Josef

>
> DG
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>