[SciPy-User] BFGS precision vs least_sq

Thu Oct 16 07:59:44 EDT 2014

On Wed, Oct 15, 2014 at 7:25 PM, Andrew Nelson <andyfaff at gmail.com> wrote:

> Dear scipy users,
> I am using BFGS as a final 'polishing' minimizer after using another
> minimization technique.  I thought I would check how it performs against a
> known non-linear regression standard,
> http://www.itl.nist.gov/div898/strd/nls/data/thurber.shtml.
>
> I discovered that BFGS doesn't get to the absolute known minimum for this
> least squares problem, even if the starting point is almost the best
> solution.  Rather, it finishes with a chi2 of 5648.6 against the lowest
> chi2 minimum of 5642.7 (using scipy.optimize.leastsq).  The problem seems
> to be 'precision loss', as reported by fmin_bfgs.  I have tried adjusting
> the gtol and epsilon options for fmin_bfgs, but did not do any better.
> Ok, so the two chi2 are only out by 0.1%, but I would like to know why
> BFGS can't get to the same minimum as leastsq.
>
>
You mentioned the result for chi-square but not for the parameters
themselves.  Are those close to the certified values?

Your observation is consistent with my experience with the NIST standard
reference data.  The tests are hard (and Thurber is listed as Higher level
of difficulty), and leastsq() does much better than the scalar optimization
methods from scipy both in terms of final results and number of function
evaluations.

I suspect that because leastsq() examines the full Jacboian (that is, has a
residual array) instead of a single scalar value, it can better navigate
the parameter space.   Leastsq() mixes Gauss-Netwon and steepest descent,
but I don't know why that would give an improvement in ultimate chi-square
for the Thurber problem over Newton's method (which I understand BFGS to
be).

I also suspect that the NIST "certified results" are derived from analysis
with MINPACK-1 (ie, leastsq).   I don't know this for sure, but the tests
are old enough (mid to late 1990s, perhaps older) that using MINPACK for
the analysis would be reasonable.  Furthermore, the results from leastsq
are very, very good.  For the Thurber data, the 37 data points have 5 to 7
digits for x, and 4 digits for y.  The certified parameters and their
uncertainties,  and the residual sum of squares are given to 10 digits  --
the Readme states that not all of these values are statistically
significant, and that 3 digits are hard to get for some problems.  A simple
use of leastsq() for the Thurber data with no fussing with tolerances or
scaling gives at least 4 correct digits for each parameter, at least 3
digits for the parameter uncertainties and 8 digits for the residual sum of
squares, from either of the 2 supplied starting values.    Similarly
excellent results are found for most of the NIST tests with leastsq() -- it
often does as well as NIST says to expect.   The certified values had to
come from somewhere... why not from a non-linear least-squares using
MINPACK?   That's just to say that comparing any method to leastsq on the
NIST data may be a biased test in favor of leastsq.

OTOH, BFGS fails to alter the initial values from either starting point for
the Thurber data for me.  So the fact that you're finding 3 digits for
residual sum of squares is a huge improvement!

Whether the Thurber test is a good test for the polishing step of a global
optimizer may be another issue.  For a global optimizer you probably *want*
a scaler minimizer  -- so that maximum likelihood can be used instead of
least-squares, for example.  That is, using leastsq() for the polishing
stage may not be suitable.    And if the idea of global optimization is to
locate good starting points for local minimization, then it shouldn't
matter too much what local optimizer you use.

Not sure that really answers your question, but hope it helps....

--Matt Newville
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20141016/6cd600da/attachment.html>