[SciPy-User] scipy.optimize named argument inconsistency

Mon Sep 5 14:17:06 EDT 2011

On Sep 5, 2011, at 4:27 PM, josef.pktd at gmail.com wrote:

> On Mon, Sep 5, 2011 at 10:55 AM, Christoph Deil
> <deil.christoph at googlemail.com> wrote:
>> 
>> How can I compute the covariance matrix for a likelihood fit using the
>> routines in scipy.optimize?
> 
> 1300 lines in scikits.statsmodels  wrapping some scipy optimizers for
> maximum likelihood estimation
> e.g. LikelihoodModel
> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L81
> GenericLikelihoodModel
> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/base/model.py#L421
> in the latter I use loglike and loglikeobs, depending on whether I
> need or want the Jacobian or not.

This is a very nice class!

But installing and learning statsmodels takes a while, and wanting parameter errors for a likelihood fit
is a very common case for using an optimizer, so I think it would be nice to include that functionality in scipy.optimize itself (see code below).

> 
>> As far as I see the only method that returns a covariance matrix (or results
>> from with it is easy to compute the covariance matrix) are leastsq and
>> curve_fit. The problem with these is that they take a residual vector chi
>> and internally compute  chi2 = (chi**2).sum() and minimize that. But for a
>> likelihood fit the quantity to be minimized is logL.sum(), i.e. no squaring
>> is required, so as far as I can see it is not possible to do a likelihood
>> fit with leastsq and curve_fit.
>> On the other hand as Matt pointed out most (all) other optimization methods
>> (like e.g. fmin) don't do this squaring and summing internally, but the user
>> function does it if one is doing a chi2 fit, so using these it is easy to do
>> a likelihood fit. But it is not possible to get parameter errors because
>> these other optimizers don't return a Hessian or covariance matrix.
>> It would be nice if there were a method in scipy.optimize to compute the
>> covariance matrix for all optimizers.
> 
> fmin and the other optimizer don't have enough assumptions to figure
> out whether the Hessian is the covariance matrix of the parameters.
> This is only true if the objective function is the loglikelihood. But
> there is no reason fmin and the others should assume this.
> 
> We had a brief discussion a while ago on the mailinglist which led to
> the notes in the leastsq docstring
> http://docs.scipy.org/scipy/docs/scipy.optimize.minpack.leastsq/#leastsq
> that cov_x also is not always the (unscaled) covariance matrix.
> 
> If the objective function comes from a different estimation method for
> example, then I think it's usually not the case that the (inverse)
> Hessian is the covariance matrix.

Actually all that is missing from scipy to be able to get parameter errors in a likelihood fit (which includes chi2 fit and I believe is by far the most common case) with scipy.optimize (for all optimizers!) is a method to compute the Hessian, as already available e.g. in these packages:
http://code.google.com/p/numdifftools/source/browse/trunk/numdifftools/core.py#1134
https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/sandbox/regression/numdiff.py#L99

Would it be possible to move one of the approx_hess methods to scipy.optimize (there is already a scipy.optimize.approx_fprime)?

Looking through numpy and scipy the only differentiation method I could find was
http://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.derivative.html
which is only 1D, I believe.

"""This is the example from the curve_fit docstring"""
import numpy as np
from scipy.optimize import curve_fit, fmin
def func(x, a, b, c):
    return a * np.exp(-b * x) + c
p0 = (2.5, 1.3, 0.5)
x = np.linspace(0, 4, 50)
y = func(x, *p0)
np.random.seed(0)
yn = y + 0.2 * np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn)
print 'curve_fit results:'
print 'values:', popt
print 'errors:', np.sqrt(pcov.diagonal())

"""And here is how to compute the fit parameter values and errors
using one of the other optimizers (exemplified with fmin) and 
a method to compute the Hesse matrix"""
def chi2(pars):
    chi = yn - func(x, *pars)
    return (chi ** 2).sum()
popt = fmin(chi2, p0, disp=False)
from numpy.dual import inv
from scikits.statsmodels.sandbox.regression.numdiff import approx_hess3 as approx_hess
phess = approx_hess(popt, chi2)
def approx_covar(hess, red_chi2):
    return red_chi2 * inv(phess / 2.)
pcov = approx_covar(popt, chi2(popt) / (len(x) - len(p0)))
print 'curve_fit results:'
print 'values:', popt
print 'errors:', np.sqrt(pcov.diagonal())

curve_fit results:
values: [ 2.80720814  1.24568448  0.44517316]
errors: [ 0.12563313  0.12409886  0.05875364]
fmin and approx_hess results:
values: [ 2.80720337  1.24565936  0.44515526]
errors: [ 0.12610377  0.12802944  0.05979982]

> 
>> And it would be nice if the super-fast Levenberg-Marquard optimizer called
>> by leastsq were also available for likelihood fits.
>> Maybe it would be possible to add one method to scipy.optimize to compute
>> the covariance matrix after any of the optimizers has run,
>> i.e. the best-fit pars have been determined?
>> cov = scipy.optimize.cov(func, pars)
>> I think this can be done by computing the Hessian via finite differences and
>> then inverting it.
>> leastsq seems to compute the Hessian from a "Jacobian". This is the part I
>> don't understand, but without going into the details, let me ask this:
>> Is there something special about the Levenberg-Marquard optimizer that it
>> requires the individual observations?
> 
> It uses the outer (?) product of the Jacobian (all observations) to
> find the improving directions, and the outerproduct of the Jacobian is
> a numerical approximation to the Hessian in the
> err = y-f(x,params) or loglikelihood case.
> The Hessian calculation can break down quite easily (not positive
> definite because of numerical problems), the product of the Jacobian
> is more robust in my experience and cheaper. (statsmodels
> GenericLikelihood has covariance based on Hessian, Jacobian and a
> sandwich of the two).

Its true that there are often numerical problems with the Hessian.
If the Jacobian method is more robust, then maybe that could be included in scipy.optimize instead of or in addition to the Hesse method?

> 
> But I never looked at the internals of minpack.
> 
> Josef

> 
>> Or is it just that the current implementation of _minpack._lmdif (which is
>> called by leastsq) was written such that it works this way (i.e. includes a
>> computation of cov_x in addition to x) and it could also be written to take
>> a scalar func like all the other  optimizers?
>> Christoph
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20110905/1220d735/attachment.html>