[SciPy-User] BFGS precision vs least_sq
Andrew Nelson
andyfaff at gmail.com
Thu Oct 16 17:52:33 EDT 2014
On 16 October 2014 22:59, Matt Newville <newville at cars.uchicago.edu> wrote:
> On Wed, Oct 15, 2014 at 7:25 PM, Andrew Nelson <andyfaff at gmail.com> wrote:
>
>> Dear scipy users,
>> I am using BFGS as a final 'polishing' minimizer after using another
>> minimization technique. I thought I would check how it performs against a
>> known non-linear regression standard,
>> http://www.itl.nist.gov/div898/strd/nls/data/thurber.shtml.
>>
>> I discovered that BFGS doesn't get to the absolute known minimum for this
>> least squares problem, even if the starting point is almost the best
>> solution. Rather, it finishes with a chi2 of 5648.6 against the lowest
>> chi2 minimum of 5642.7 (using scipy.optimize.leastsq). The problem seems
>> to be 'precision loss', as reported by fmin_bfgs. I have tried adjusting
>> the gtol and epsilon options for fmin_bfgs, but did not do any better.
>> Ok, so the two chi2 are only out by 0.1%, but I would like to know why
>> BFGS can't get to the same minimum as leastsq.
>>
>>
> You mentioned the result for chi-square but not for the parameters
> themselves. Are those close to the certified values?
>
> Your observation is consistent with my experience with the NIST standard
> reference data. The tests are hard (and Thurber is listed as Higher level
> of difficulty), and leastsq() does much better than the scalar optimization
> methods from scipy both in terms of final results and number of function
> evaluations.
>
> I suspect that because leastsq() examines the full Jacboian (that is, has
> a residual array) instead of a single scalar value, it can better navigate
> the parameter space. Leastsq() mixes Gauss-Netwon and steepest descent,
> but I don't know why that would give an improvement in ultimate chi-square
> for the Thurber problem over Newton's method (which I understand BFGS to
> be).
>
> I also suspect that the NIST "certified results" are derived from analysis
> with MINPACK-1 (ie, leastsq). I don't know this for sure, but the tests
> are old enough (mid to late 1990s, perhaps older) that using MINPACK for
> the analysis would be reasonable. Furthermore, the results from leastsq
> are very, very good. For the Thurber data, the 37 data points have 5 to 7
> digits for x, and 4 digits for y. The certified parameters and their
> uncertainties, and the residual sum of squares are given to 10 digits --
> the Readme states that not all of these values are statistically
> significant, and that 3 digits are hard to get for some problems. A simple
> use of leastsq() for the Thurber data with no fussing with tolerances or
> scaling gives at least 4 correct digits for each parameter, at least 3
> digits for the parameter uncertainties and 8 digits for the residual sum of
> squares, from either of the 2 supplied starting values. Similarly
> excellent results are found for most of the NIST tests with leastsq() -- it
> often does as well as NIST says to expect. The certified values had to
> come from somewhere... why not from a non-linear least-squares using
> MINPACK? That's just to say that comparing any method to leastsq on the
> NIST data may be a biased test in favor of leastsq.
>
> OTOH, BFGS fails to alter the initial values from either starting point
> for the Thurber data for me. So the fact that you're finding 3 digits for
> residual sum of squares is a huge improvement!
>
> Whether the Thurber test is a good test for the polishing step of a global
> optimizer may be another issue. For a global optimizer you probably *want*
> a scaler minimizer -- so that maximum likelihood can be used instead of
> least-squares, for example. That is, using leastsq() for the polishing
> stage may not be suitable. And if the idea of global optimization is to
> locate good starting points for local minimization, then it shouldn't
> matter too much what local optimizer you use.
>
>
>
The starting parameters in the demo script I wrote are from a
differential_evolution minimisation followed by a BFGS 'polish'. So in
that respect I'm taking the endpoint of a BFGS run and trying to get it
even lower by running BFGS again. I then wanted to see if leastsq got any
lower. It's a bit unfair, the output vector from BFGS is very close to the
certified values.
I would've thought that both should be equally good at calculating the
gradient/Jacobian (first derivative matrix), that's simply a vector of
\frac{\partial \chi^2}{\partial x_i}. I understand that the way that they
calculate the Hessians are different.
The reason I am trying to get to the bottom of this is because I am working
on the benchmark suite for (scalar) global optimizers. Each problem should
have a way of judging if the global minimum has been reached. The easy way
is to just set an absolute tolerance for differences from the known minimum
energy e.g. 1e-4. However, that won't be fair in problems such as these -
the BFGS is very close to the minimum energy (within 0.1%), it just can't
polish to the absolute minimum like leastsq can (at least from this
starting point). But is that a reason to mark it as a fail? So for some
problems in the benchmarking perhaps there needs to be an rtol and atol.
How does one set those rtol/atol?
The polishing in differential_evolution is done by L-BFGS-S, it can't use
leastsq for the reason you mentioned. But for the least squares problems
the leastsq can be run independently.
I went through the code for bfgs and got to line_search_wolfe (or something
along those lines), to see where the precision warning message came from.
There my code reading head gave out. I was wondering if there were any
tolerances/epsilons I could set tighter to get that final drop of reduction
in energy, but I didn't get any further. Perhaps the list can explain the
situation in which the precision message is emitted?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20141017/8eca462d/attachment.html>
More information about the SciPy-User
mailing list