[SciPy-User] max likelihood

Tue Jun 22 04:14:07 EDT 2010

On Tue, Jun 22, 2010 at 3:46 AM, eneide.odissea
<eneide.odissea at gmail.com> wrote:
> Hi All
> I need to use max likelihood algorithm for fitting parameters for a
> GARCH(1,1) model.
> Is the Distribution to be assumed normal?

loglike_GARCH11  assuming normal distribution, and constant or removed mean
http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/regression/mle.py#L1002

simple example for estimation with scipy.optimize.fmin:
http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/examples/example_garch.py#L46

normal distribution is the standard, but there are also several other
distributions that are used for garch, e.g. t-distribution.

garch11 looks ok in my tests, but overall the garch code is still a
mess, and it was written before the recent improvement to mle in
statsmodels.

If never seen any other GARCH code in python.

Josef

>
> On Tue, Jun 22, 2010 at 3:43 AM, Skipper Seabold <jsseabold at gmail.com>
> wrote:
>>
>> On Mon, Jun 21, 2010 at 8:41 PM, David Goldsmith
>> <d.l.goldsmith at gmail.com> wrote:
>> > On Mon, Jun 21, 2010 at 5:19 PM, <josef.pktd at gmail.com> wrote:
>> >>
>> >> On Mon, Jun 21, 2010 at 8:03 PM, David Goldsmith
>> >> <d.l.goldsmith at gmail.com> wrote:
>> >> > On Mon, Jun 21, 2010 at 4:10 PM, <josef.pktd at gmail.com> wrote:
>> >> >>
>> >> >> On Mon, Jun 21, 2010 at 7:03 PM, David Goldsmith
>> >> >> <d.l.goldsmith at gmail.com> wrote:
>> >> >> > On Mon, Jun 21, 2010 at 3:17 PM, Skipper Seabold
>> >> >> > <jsseabold at gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> On Mon, Jun 21, 2010 at 5:55 PM, David Goldsmith
>> >> >> >> <d.l.goldsmith at gmail.com> wrote:
>> >> >> >> > On Mon, Jun 21, 2010 at 2:43 PM, Skipper Seabold
>> >> >> >> > <jsseabold at gmail.com>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> On Mon, Jun 21, 2010 at 5:34 PM, David Goldsmith
>> >> >> >> >> <d.l.goldsmith at gmail.com> wrote:
>> >> >> >> >> > On Mon, Jun 21, 2010 at 2:17 PM, eneide.odissea
>> >> >> >> >> > <eneide.odissea at gmail.com>
>> >> >> >> >> > wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> Hi All
>> >> >> >> >> >> I had a look at the scipy.stats documentation and I was not
>> >> >> >> >> >> able
>> >> >> >> >> >> to
>> >> >> >> >> >> find a
>> >> >> >> >> >> function for
>> >> >> >> >> >> maximum likelihood parameter estimation.
>> >> >> >> >> >> Do you know whether is available in some other
>> >> >> >> >> >> namespace/library
>> >> >> >> >> >> of
>> >> >> >> >> >> scipy?
>> >> >> >> >> >> I found on the web few libraries ( this one is an
>> >> >> >> >> >> example http://bmnh.org/~pf/p4.html ) having it,
>> >> >> >> >> >> but I would prefer to start playing with what scipy already
>> >> >> >> >> >> offers
>> >> >> >> >> >> by
>> >> >> >> >> >> default ( if any ).
>> >> >> >> >> >> Kind Regards
>> >> >> >> >> >> eo
>> >> >> >> >> >
>> >> >> >> >> > scipy.stats.distributions.rv_continuous.fit (I was just
>> >> >> >> >> > working
>> >> >> >> >> > on
>> >> >> >> >> > the
>> >> >> >> >> > docstring for that; I don't believe my changes have been
>> >> >> >> >> > merged; I
>> >> >> >> >> > believe
>> >> >> >> >> > Travis recently updated its code...)
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> This is for fitting the parameters of a distribution via
>> >> >> >> >> maximum
>> >> >> >> >> likelihood given that the DGP is the underlying distribution.
>> >> >> >> >>  I
>> >> >> >> >> don't
>> >> >> >> >> think it is intended for more complicated likelihood functions
>> >> >> >> >> (where
>> >> >> >> >> Nelder-Mead might fail).  And in any event it will only find
>> >> >> >> >> the
>> >> >> >> >> parameters of the distribution rather than the parameters of
>> >> >> >> >> some
>> >> >> >> >> underlying model, if this is what you're after.
>> >> >> >> >>
>> >> >> >> >> Skipper
>> >> >> >> >
>> >> >> >> > OK, but just for clarity in my own mind: are you saying that
>> >> >> >> > rv_continuous.fit is _definitely_ inappropriate/inadequate for
>> >> >> >> > OP's
>> >> >> >> > needs
>> >> >> >> > (i.e., am I _completely_ misunderstanding the relationship
>> >> >> >> > between
>> >> >> >> > the
>> >> >> >> > function and OP's stated needs), or are you saying that the
>> >> >> >> > function
>> >> >> >> > _may_
>> >> >> >> > not be general/robust enough for OP's stated needs?
>> >> >> >>
>> >> >> >> Well, I guess it depends on exactly what kind of likelihood
>> >> >> >> function
>> >> >> >> is being optimized.  That's why I asked.
>> >> >> >>
>> >> >> >> My experience with stats.distributions is all of about a week, so
>> >> >> >> I
>> >> >> >> could be wrong. But here it goes... rv_continuous is not intended
>> >> >> >> to
>> >> >> >> be used on its own but rather as the base class for any
>> >> >> >> distribution.
>> >> >> >> So if you believe that your data came from say an Gaussian
>> >> >> >> distribution, then you could use norm.fit(data) (with other
>> >> >> >> options
>> >> >> >> as
>> >> >> >> needed) to get back estimates of scale and location.  So
>> >> >> >>
>> >> >> >> In [31]: from scipy.stats import norm
>> >> >> >>
>> >> >> >> In [32]: import numpy as np
>> >> >> >>
>> >> >> >> In [33]: x = np.random.normal(loc=0,scale=1,size=1000)
>> >> >> >>
>> >> >> >> In [34]: norm.fit(x)
>> >> >> >> Out[34]: (-0.043364692830314848, 1.0205901804210851)
>> >> >> >>
>> >> >> >> Which is close to our given location and scale.
>> >> >> >>
>> >> >> >> But if you had in mind some kind of data generating process for
>> >> >> >> your
>> >> >> >> model based on some other observed data and you were interested
>> >> >> >> in
>> >> >> >> the
>> >> >> >> marginal effects of changes in the observed data on the outcome,
>> >> >> >> then
>> >> >> >> it would be cumbersome I think to use the fit in distributions.
>> >> >> >> It
>> >> >> >> may
>> >> >> >> not be possible.   Also, as mentioned, fit only uses Nelder-Mead
>> >> >> >> (optimize.fmin with the default parameters, which I've found to
>> >> >> >> be
>> >> >> >> inadequate for even fairly basic likelihood based models), so it
>> >> >> >> may
>> >> >> >> not be robust enough.  At the moment, I can't think of a way to
>> >> >> >> fit
>> >> >> >> a
>> >> >> >> parameterized model as fit is written now.  Come to think of it
>> >> >> >> though
>> >> >> >> I don't think it would be much work to extend the fit method to
>> >> >> >> work
>> >> >> >> for something like a linear regression model.
>> >> >> >>
>> >> >> >> Skipper
>> >> >> >
>> >> >> >
>> >> >> > OK, this is all as I thought (e.g., fit only "works" to get the
>> >> >> > MLE's
>> >> >> > from
>> >> >> > data for a *presumed* distribution, but it is all-but-useless if
>> >> >> > the
>> >> >> > distribution isn't (believed to be) "known" a priori); just wanted
>> >> >> > to
>> >> >> > be
>> >> >> > sure I was reading you correctly. :-)  Thanks!
>> >> >>
>> >> >> MLE always assumes that the distribution is known, since you need
>> >> >> the
>> >> >> likelihood function.
>> >> >
>> >> > I'm not sure what I'm missing here (is it the definition of DGP? the
>> >> > meaning
>> >> > of Nelder-Mead? I want to learn, off-list if this is considered
>> >> > "noise"):
>> >> > according to my reference - Bain & Englehardt, Intro. to Prob. and
>> >> > Math.
>> >> > Stat., 2nd Ed., Duxbury, 1992 - if the underlying population
>> >> > distribution is
>> >> > known, then the likelihood function is well-determined (although the
>> >> > likelihood equation(s) it gives rise to may not be soluble
>> >> > analytically,
>> >> > of
>> >> > course).  So why doesn't the OP knowing the underlying distribution
>> >> > (as
>> >> > your
>> >> > comment above implies they should if they seek MLEs) imply that s/he
>> >> > would
>> >> > also "know" what the likelihood function "looks like," (and thus the
>> >> > question isn't so much what the likelihood function "looks like," but
>> >> > what
>> >> > the underlying distribution is, and thence, do we have that
>> >> > distribution
>> >> > implemented yet in scipy.stats)?
>> >>
>> >> DGP: data generating process
>> >>
>> >> In many cases the assumed distribution of the error or noise variable
>> >> is just the normal distribution. But what's the overall model that
>> >> explains the endogenous variable.
>> >> distribution.fit would just assume that each observations is a random
>> >> draw from the same population distribution.
>> >>
>> >> But you can do MLE on standard linear regression, system of equations,
>> >> ARIMA or GARCH in time series analysis. For any of this we need to
>> >> specify what the relationship between the endogenous variable and it's
>> >> own past and other explanatory variables is.
>> >> e.g. simplest ARMA
>> >>
>> >> A(L) y_t = B(L) e_t
>> >> with e_t independently and identically distributed (iid.) normal
>> >> random variable
>> >> A(L), B(L) lag-polynomials
>> >> and for the full MLE we would also need to specify initial conditions.
>> >>
>> >> simple linear regression with non iid errors
>> >> y_t = x_t * beta + e_t      e = {e_t}_{for all t} distributed N(0,
>> >> Sigma)   plus assumptions on the structure of Sigma
>> >>
>> >> in these cases the likelihood function defines a lot more than just
>> >> the distribution of the error term.
>> >
>> > Ah, jetzt ich verstehe (ich denke).  So in the general case, the
>> > procedure
>> > needs to "apportion" the information in the data among the parameters of
>> > the
>> > "mechanistic" part of the model and the parameters of the "random noise"
>> > part of the model, and the Maximum Likelihood Equations give you the
>> > values
>> > of all these parameters (the mechanistic ones and noise ones) that
>> > maximize
>> > the likelihood of observing the data one observed, correct?
>> >
>>
>> Yes, I think you've got for the more general case that Josef describes.
>>
>> Skipper
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>