[SciPy-Dev] Returning the (approximate) inverse hessian from L-BFGS-B as a LinearOperator?

Tue Feb 17 22:59:48 EST 2015

My medium-sized models have ~10^4 variables, so even analytic manipulations
of the Hessian get very unwieldy, and easily take a lot longer than the
optimization.

-Robert

On Tue, Feb 17, 2015 at 7:17 PM, <josef.pktd at gmail.com> wrote:

> Sorry, I'm getting messed up with the silly line breaks that gmail
> makes, and hit send several times by accident.
>
> On Tue, Feb 17, 2015 at 10:04 PM,  <josef.pktd at gmail.com> wrote:
> > On Tue, Feb 17, 2015 at 10:04 PM,  <josef.pktd at gmail.com> wrote:
> >> On Tue, Feb 17, 2015 at 10:02 PM,  <josef.pktd at gmail.com> wrote:
> >>> On Tue, Feb 17, 2015 at 9:54 PM, Jascha Sohl-Dickstein
> >>> <jascha at stanford.edu> wrote:
> >>>>
> >>>> This sounds practically like it's probably a really good test -- but
> just BTW, it's totally possible for the BFGS approximation to be either
> worse or better than the LBFGS approximation. The major cause of a
> difference in terms of accuracy is that the BFGS algorithms uses the full
> history of gradients and iterates, while LBFGS only uses the recent history
> of gradients and iterates (think it defaults to the last 10 steps for the
> scipy implementation). If the Hessian is constant, the BFGS approximation
> will be much better. If the Hessian changes significantly over the course
> of optimization, the BFGS approximation may actually turn out to be much
> worse.
> >>>>
> >>>> -jascha
> >>>>
> >>>> On Tue Feb 17 2015 at 6:27:17 PM Robert McGibbon <rmcgibbo at gmail.com>
> wrote:
> >>>>>
> >>>>> I should check the accuracy of the BFGS approx inv. hessian using
> the (dense) `hess_inv` that is computed and returned already by
> `fmin_bfgs`. The accuracy of that matrix is going to be no worse than the
> accuracy of the low-memory version computed by L-BFGS.
> >>>>>
> >>>>> -Robert
> >>>>>
> >>>>> On Tue, Feb 17, 2015 at 6:20 PM, Robert McGibbon <rmcgibbo at gmail.com>
> wrote:
> >>>>>>
> >>>>>> > My guess, without knowing what l-bfgs-b is exactly doing, is that
> the approximation might not be very good.
> >>>>>>
> >>>>>> Yeah, that might be true. But the downstream approximations that go
> into calculating error bars from these
> >>>>>> hessians (that the likelihood is quadratic) can also be pretty bad
> approximations in many of the situations that
> >>>>>> they're commonly applied, so the whole enterprise is
> semi-quantitative at best.
> >>>>>>
> >>>>>> In my application, even with some tricks to calculate the Hessian
> faster, the optimizer can often converge to the
> >>>>>> MLE _much_ faster than I can calculate a single Hessian. So
> spending say, 1-10% of my computational effort
> >>>>>> finding the MLE and 90-99% calculating error bars seems silly when
> in the end I don't even have a lot of confidence
> >>>>>> in the approximations that go from hessian -> error bars, and I'm
> looking at them more from an order of magnitude
> >>>>>> perspective.
> >>>
>
> In statsmodels we use l-bfgs-b without bounds as default optimizer in
> some models, but we still calculate the hessian ourselves using our
> own numerical derivatives functions.
> I don't think our ratio for the timing is so bad, but the hessian
> calculation can be slow. In most cases we managed to get analytical
> hessian, especially in cases where the "inner" part is linear (x *
> beta).
>
> If there is more structure to the likelihood function, then it would
> still be possible that using chain rule, or using automatic
> differentiation is fast and more accurate.
>
> Another common approach in econometrics, but not yet used much in
> statsmodels, is to to the outer product of gradients (OPG), which
> requires memory but should be reasonably fast to calculate.
> OPG is inv(x'x) where x is the derivative of the loglikelihood with
> respect to the parameters for each observation.
>
> My guess is that adding the inverse hessian operator to lbfgsb is
> worth a try, but whether it should be included in scipy will depend on
> how difficult the implementation is and on how good the approximation
> is, IMO.
>
> Josef
>
>
> >>>
> >>>>>>
> >>>>>>
> >>>>>> As the the behavior for boundary solution, I'm honestly not sure.
> Obviously the true hessian isn't defined for certain
> >>>>>> (i,j) pairs, but for other directions that aren't at the boundary
> it should be okay. How this looks for the BFGS implicit
> >>>>>> inverse hessian I don't know.
> >>>>>>
> >>>>>> -Robert
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Feb 17, 2015 at 6:05 PM, <josef.pktd at gmail.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Feb 17, 2015 at 8:18 PM, Robert McGibbon <
> rmcgibbo at gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> Hey,
> >>>>>>>>
> >>>>>>>> tl;dir: thinking of writing a PR the following feature & looking
> for feedback first.
> >>>>>>>>
> >>>>>>>> In the L-BFGS-B minimizer, a low-memory approximation to the
> inverse hessian is
> >>>>>>>> used internally to determine the descent direction. Once the
> optimization terminates,
> >>>>>>>> none of this information is returned to the client though. I
> think it would be a nice to
> >>>>>>>> return a callable for the inverse hessian - vector product as a
> LinearOperator.
> >>>>>>>>
> >>>>>>>> The main use case for this that I have in mind is for computing
> standard errors in
> >>>>>>>> maximum likelihood statistical models. In this use case, you
> write down a likelihood
> >>>>>>>> function (and maybe its gradient), and pass it off to a minimizer
> to find the MLE. Getting
> >>>>>>>> standard errors in a scalar function of the parameters with the
> delta method, requires
> >>>>>>>> computing a quadratic form like g(x)' H^{-1}(x) g(x), where g(x)
> is a vector (the
> >>>>>>>> gradient of the scalar function at the MLE) and the matrix
> H^{-1}(x) is the inverse of
> >>>>>>>> the Hessian at the MLE.
> >>>>>>>>
> >>>>>>>> For large-scale optimization of a likelihood with L-BFGS-B, the
> hessian is too big to
> >>>>>>>> calculate (or invert) explicitly, but a low memory approximation
> to the inverse is
> >>>>>>>> already available inside the optimizer, so I think it makes sense
> to expose it. The
> >>>>>>>> factored form of the BFGS matrix is kind of harry, so the most
> reasonable route is
> >>>>>>>> to return it as a LinearOperator that computes approximate
> inverse hessian - vector
> >>>>>>>> products. This calculation can be done pretty efficiently, I
> think, with the routine from
> >>>>>>>> section 4 of [1].
> >>>>>>>>
> >>>>>>>> Does this make sense to people? Is it a desirable feature? I'm
> thinking of writing a
> >>>>>>>> PR for it, but I wanted to check here first before diving down
> the rabbit hole.
> >>>>>>>>
> >>>>>>>> [1] Nocedal, J. "Updating quasi-Newton matrices with limited
> storage."Math. Comput.
> >>>>>>>> 35.151 (1980): 773-782.
> >>>>>>>>
> >>>>>>>
> >>>>>>> My guess, without knowing what l-bfgs-b is exactly doing, is that
> the approximation might not be very good.
> >>>>>>> (I remember there was the suggestion once to get the Lagrange
> multipliers out of one of the constraint optimizers, but when whoever
> proposed that saw how inaccurate they were, he gave up on the idea.)
> >>>>>>>
> >>>>>>> If it's the only way to get an approximation in large problems,
> then it might still be worth it.
> >>>>>>>
> >>>>>>> Another question: How is the Hessian calculated if there are
> binding constraints?
> >>>>>>> The Hessian won't give you the correct standard errors for a
> boundary solution.
> >>>>>>>
> >>>>>>> Josef
> >>>>>>>
> >>>>>>>>
> >>>>>>>> -Robert
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> SciPy-Dev mailing list
> >>>>>>>> SciPy-Dev at scipy.org
> >>>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> SciPy-Dev mailing list
> >>>>>>> SciPy-Dev at scipy.org
> >>>>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> SciPy-Dev mailing list
> >>>>> SciPy-Dev at scipy.org
> >>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> SciPy-Dev mailing list
> >>>> SciPy-Dev at scipy.org
> >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
> >>>>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20150217/8029eebf/attachment.html>