[SciPy-Dev] Returning the (approximate) inverse hessian from L-BFGS-B as a LinearOperator?

Tue Feb 17 21:05:38 EST 2015

On Tue, Feb 17, 2015 at 8:18 PM, Robert McGibbon <rmcgibbo at gmail.com> wrote:

> Hey,
>
> tl;dir: thinking of writing a PR the following feature & looking for
> feedback first.
>
> In the L-BFGS-B minimizer, a low-memory approximation to the inverse
> hessian is
> used internally to determine the descent direction. Once the optimization
> terminates,
> none of this information is returned to the client though. I think it
> would be a nice to
> return a callable for the inverse hessian - vector product as a
> LinearOperator.
>
> The main use case for this that I have in mind is for computing standard
> errors in
> maximum likelihood statistical models. In this use case, you write down a
> likelihood
> function (and maybe its gradient), and pass it off to a minimizer to find
> the MLE. Getting
> standard errors in a scalar function of the parameters with the delta
> method <http://en.wikipedia.org/wiki/Delta_method>, requires
> computing a quadratic form like g(x)' H^{-1}(x) g(x), where g(x) is a
> vector (the
> gradient of the scalar function at the MLE) and the matrix H^{-1}(x) is
> the inverse of
> the Hessian at the MLE.
>
> For large-scale optimization of a likelihood with L-BFGS-B, the hessian is
> too big to
> calculate (or invert) explicitly, but a low memory approximation to the
> inverse is
> already available inside the optimizer, so I think it makes sense to
> expose it. The
> factored form of the BFGS matrix is kind of harry, so the most reasonable
> route is
> to return it as a LinearOperator that computes approximate inverse hessian
> - vector
> products. This calculation can be done pretty efficiently, I think, with
> the routine from
> section 4 of [1].
>
> Does this make sense to people? Is it a desirable feature? I'm thinking of
> writing a
> PR for it, but I wanted to check here first before diving down the rabbit
> hole.
>
> [1] Nocedal, J. "Updating quasi-Newton matrices with limited storage
> <http://folk.uib.no/ssu029/Pdf_file/Nocedal80.pdf>."*Math. Comput.*
> 35.151 (1980): 773-782.
>
>
My guess, without knowing what l-bfgs-b is exactly doing, is that the
approximation might not be very good.
(I remember there was the suggestion once to get the Lagrange multipliers
out of one of the constraint optimizers, but when whoever proposed that saw
how inaccurate they were, he gave up on the idea.)

If it's the only way to get an approximation in large problems, then it
might still be worth it.

Another question: How is the Hessian calculated if there are binding
constraints?
The Hessian won't give you the correct standard errors for a boundary
solution.

Josef

> -Robert
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20150217/19ae7d3d/attachment.html>