[SciPy-Dev] directional derivatives in Wolfe line search

Wed Jul 28 12:50:20 EDT 2021

>  The first hurdle is to ensure that correctness isn't affects. The second hurdle is to show that changes improve things. I'm not sure that we have a comprehensive test suite for these kinds of comparisons for scipy. There are functions in benchmarks/test_functions.py, but we don't have something more comprehensive like CUTEST. It would be good to have that!

I ended up translating the Moré and Wild CUTEST benchmark into python (from matlab) for my own experiment anyway, so I’d be happy to include that as a benchmark for scipy.

> If I understand correctly according to that graph your implementation in orange is better than the existing scipy implementation?

It’s a bit nuanced; it looks like the current scipy implementation is faster than my orange implementation for some problems, but there are other problems where my orange implementation converges and the current scipy gets stuck. I suspect the former is probably because of the fancier finite differencing in scipy, and the latter might be because I added a little check in the BFGS update (since the benchmark problems are nonconvex): if BFGS ever tries to take an ascent direction, I replace it with steepest descent for that step, and reset the BFGS matrix to identity.

> I'm guessing that the modification would be:
> 
> 1. If user provides a callable(jac), then use that in derphi.
> 2. If jac is estimated via a FD method then estimate derphi by finite difference along the search direction, as you suggest. Estimate grad by FD of `fun`. Both grad and derphi are calculated in line_search_wolfe1, line_search_wolfe2.

Yes, I think this would be good. 

Best,
Sara

> On Jul 28, 2021, at 12:38 AM, Andrew Nelson <andyfaff at gmail.com> wrote:
> 
> On Wed, 28 Jul 2021 at 15:35, Sara Fridovich-Keil <sarafridov at gmail.com <mailto:sarafridov at gmail.com>> wrote:
> Thanks for the input. I can describe the experiment I’ve done so far that makes me believe this simpler finite differencing would be an improvement for the DFO case, but I’m not sure if there is a standard benchmark that is used for making these decisions in scipy. 
> 
>  The first hurdle is to ensure that correctness isn't affects. The second hurdle is to show that changes improve things. I'm not sure that we have a comprehensive test suite for these kinds of comparisons for scipy. There are functions in benchmarks/test_functions.py, but we don't have something more comprehensive like CUTEST. It would be good to have that!
>  
> My experiment is on the DFO benchmark of Moré and Wild (https://www.mcs.anl.gov/~more/dfo/ <https://www.mcs.anl.gov/~more/dfo/>), comparing 3 algorithms. The green curve is just calling the current implementation of fmin_bfgs. The orange curve is my own implementation of fmin_bfgs based on scipy but using a simple forward finite differencing approximation to the full gradient.
> 
> If I understand correctly according to that graph your implementation in orange is better than the existing scipy implementation?
> 
> The blue curve is the same as the orange one, but replacing the forward finite differencing approximation of the full gradient with a forward finite differencing approximation of just the directional derivative (inside the Wolfe line search). The sampling distance for finite differencing is 1.4901161193847656e-08 (for both the orange and blue curves), which is the same default as scipy (https://github.com/scipy/scipy/blob/v1.7.0/scipy/optimize/optimize.py#L1448 <https://github.com/scipy/scipy/blob/v1.7.0/scipy/optimize/optimize.py#L1448>, and the value recommended in Nocedal and Wright). The blue curve is the algorithm I’m proposing; I just included the orange one to compare the simple vs sophisticated full-gradient finite differencing methods (and any differences in the outer BFGS, which should be basically the same).
> 
> Note that the finite differencing for most of the optimizers is done by approx_derivative (https://github.com/scipy/scipy/blob/master/scipy/optimize/_numdiff.py#L257 <https://github.com/scipy/scipy/blob/master/scipy/optimize/_numdiff.py#L257>), which chooses the step size automatically. If `jac=None` for `_minimize_bfgs` then the default is for absolute steps, but you can use relative steps by specifying `jac='2-point', etc.
> 
>  I'm guessing that the modification would be:
> 
> 1. If user provides a callable(jac), then use that in derphi.
> 2. If jac is estimated via a FD method then estimate derphi by finite difference along the search direction, as you suggest. Estimate grad by FD of `fun`. Both grad and derphi are calculated in line_search_wolfe1, line_search_wolfe2.
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210728/42f50d78/attachment.html>