[SciPy-User] max likelihood

Mon Jun 21 21:43:11 EDT 2010

On Mon, Jun 21, 2010 at 8:41 PM, David Goldsmith
<d.l.goldsmith at gmail.com> wrote:
> On Mon, Jun 21, 2010 at 5:19 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Mon, Jun 21, 2010 at 8:03 PM, David Goldsmith
>> <d.l.goldsmith at gmail.com> wrote:
>> > On Mon, Jun 21, 2010 at 4:10 PM, <josef.pktd at gmail.com> wrote:
>> >>
>> >> On Mon, Jun 21, 2010 at 7:03 PM, David Goldsmith
>> >> <d.l.goldsmith at gmail.com> wrote:
>> >> > On Mon, Jun 21, 2010 at 3:17 PM, Skipper Seabold
>> >> > <jsseabold at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> On Mon, Jun 21, 2010 at 5:55 PM, David Goldsmith
>> >> >> <d.l.goldsmith at gmail.com> wrote:
>> >> >> > On Mon, Jun 21, 2010 at 2:43 PM, Skipper Seabold
>> >> >> > <jsseabold at gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> On Mon, Jun 21, 2010 at 5:34 PM, David Goldsmith
>> >> >> >> <d.l.goldsmith at gmail.com> wrote:
>> >> >> >> > On Mon, Jun 21, 2010 at 2:17 PM, eneide.odissea
>> >> >> >> > <eneide.odissea at gmail.com>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> Hi All
>> >> >> >> >> I had a look at the scipy.stats documentation and I was not
>> >> >> >> >> able
>> >> >> >> >> to
>> >> >> >> >> find a
>> >> >> >> >> function for
>> >> >> >> >> maximum likelihood parameter estimation.
>> >> >> >> >> Do you know whether is available in some other
>> >> >> >> >> namespace/library
>> >> >> >> >> of
>> >> >> >> >> scipy?
>> >> >> >> >> I found on the web few libraries ( this one is an
>> >> >> >> >> example http://bmnh.org/~pf/p4.html ) having it,
>> >> >> >> >> but I would prefer to start playing with what scipy already
>> >> >> >> >> offers
>> >> >> >> >> by
>> >> >> >> >> default ( if any ).
>> >> >> >> >> Kind Regards
>> >> >> >> >> eo
>> >> >> >> >
>> >> >> >> > scipy.stats.distributions.rv_continuous.fit (I was just working
>> >> >> >> > on
>> >> >> >> > the
>> >> >> >> > docstring for that; I don't believe my changes have been
>> >> >> >> > merged; I
>> >> >> >> > believe
>> >> >> >> > Travis recently updated its code...)
>> >> >> >> >
>> >> >> >>
>> >> >> >> This is for fitting the parameters of a distribution via maximum
>> >> >> >> likelihood given that the DGP is the underlying distribution.  I
>> >> >> >> don't
>> >> >> >> think it is intended for more complicated likelihood functions
>> >> >> >> (where
>> >> >> >> Nelder-Mead might fail).  And in any event it will only find the
>> >> >> >> parameters of the distribution rather than the parameters of some
>> >> >> >> underlying model, if this is what you're after.
>> >> >> >>
>> >> >> >> Skipper
>> >> >> >
>> >> >> > OK, but just for clarity in my own mind: are you saying that
>> >> >> > rv_continuous.fit is _definitely_ inappropriate/inadequate for
>> >> >> > OP's
>> >> >> > needs
>> >> >> > (i.e., am I _completely_ misunderstanding the relationship between
>> >> >> > the
>> >> >> > function and OP's stated needs), or are you saying that the
>> >> >> > function
>> >> >> > _may_
>> >> >> > not be general/robust enough for OP's stated needs?
>> >> >>
>> >> >> Well, I guess it depends on exactly what kind of likelihood function
>> >> >> is being optimized.  That's why I asked.
>> >> >>
>> >> >> My experience with stats.distributions is all of about a week, so I
>> >> >> could be wrong. But here it goes... rv_continuous is not intended to
>> >> >> be used on its own but rather as the base class for any
>> >> >> distribution.
>> >> >> So if you believe that your data came from say an Gaussian
>> >> >> distribution, then you could use norm.fit(data) (with other options
>> >> >> as
>> >> >> needed) to get back estimates of scale and location.  So
>> >> >>
>> >> >> In [31]: from scipy.stats import norm
>> >> >>
>> >> >> In [32]: import numpy as np
>> >> >>
>> >> >> In [33]: x = np.random.normal(loc=0,scale=1,size=1000)
>> >> >>
>> >> >> In [34]: norm.fit(x)
>> >> >> Out[34]: (-0.043364692830314848, 1.0205901804210851)
>> >> >>
>> >> >> Which is close to our given location and scale.
>> >> >>
>> >> >> But if you had in mind some kind of data generating process for your
>> >> >> model based on some other observed data and you were interested in
>> >> >> the
>> >> >> marginal effects of changes in the observed data on the outcome,
>> >> >> then
>> >> >> it would be cumbersome I think to use the fit in distributions. It
>> >> >> may
>> >> >> not be possible.   Also, as mentioned, fit only uses Nelder-Mead
>> >> >> (optimize.fmin with the default parameters, which I've found to be
>> >> >> inadequate for even fairly basic likelihood based models), so it may
>> >> >> not be robust enough.  At the moment, I can't think of a way to fit
>> >> >> a
>> >> >> parameterized model as fit is written now.  Come to think of it
>> >> >> though
>> >> >> I don't think it would be much work to extend the fit method to work
>> >> >> for something like a linear regression model.
>> >> >>
>> >> >> Skipper
>> >> >
>> >> >
>> >> > OK, this is all as I thought (e.g., fit only "works" to get the MLE's
>> >> > from
>> >> > data for a *presumed* distribution, but it is all-but-useless if the
>> >> > distribution isn't (believed to be) "known" a priori); just wanted to
>> >> > be
>> >> > sure I was reading you correctly. :-)  Thanks!
>> >>
>> >> MLE always assumes that the distribution is known, since you need the
>> >> likelihood function.
>> >
>> > I'm not sure what I'm missing here (is it the definition of DGP? the
>> > meaning
>> > of Nelder-Mead? I want to learn, off-list if this is considered
>> > "noise"):
>> > according to my reference - Bain & Englehardt, Intro. to Prob. and Math.
>> > Stat., 2nd Ed., Duxbury, 1992 - if the underlying population
>> > distribution is
>> > known, then the likelihood function is well-determined (although the
>> > likelihood equation(s) it gives rise to may not be soluble analytically,
>> > of
>> > course).  So why doesn't the OP knowing the underlying distribution (as
>> > your
>> > comment above implies they should if they seek MLEs) imply that s/he
>> > would
>> > also "know" what the likelihood function "looks like," (and thus the
>> > question isn't so much what the likelihood function "looks like," but
>> > what
>> > the underlying distribution is, and thence, do we have that distribution
>> > implemented yet in scipy.stats)?
>>
>> DGP: data generating process
>>
>> In many cases the assumed distribution of the error or noise variable
>> is just the normal distribution. But what's the overall model that
>> explains the endogenous variable.
>> distribution.fit would just assume that each observations is a random
>> draw from the same population distribution.
>>
>> But you can do MLE on standard linear regression, system of equations,
>> ARIMA or GARCH in time series analysis. For any of this we need to
>> specify what the relationship between the endogenous variable and it's
>> own past and other explanatory variables is.
>> e.g. simplest ARMA
>>
>> A(L) y_t = B(L) e_t
>> with e_t independently and identically distributed (iid.) normal
>> random variable
>> A(L), B(L) lag-polynomials
>> and for the full MLE we would also need to specify initial conditions.
>>
>> simple linear regression with non iid errors
>> y_t = x_t * beta + e_t      e = {e_t}_{for all t} distributed N(0,
>> Sigma)   plus assumptions on the structure of Sigma
>>
>> in these cases the likelihood function defines a lot more than just
>> the distribution of the error term.
>
> Ah, jetzt ich verstehe (ich denke).  So in the general case, the procedure
> needs to "apportion" the information in the data among the parameters of the
> "mechanistic" part of the model and the parameters of the "random noise"
> part of the model, and the Maximum Likelihood Equations give you the values
> of all these parameters (the mechanistic ones and noise ones) that maximize
> the likelihood of observing the data one observed, correct?
>

Yes, I think you've got for the more general case that Josef describes.

Skipper