[SciPy-dev] Proposal for more generic optimizers (posted before on scipy-user)

Mon Apr 16 14:47:59 EDT 2007

On 16 Apr 2007, at 9:07 AM, Matthieu Brucher wrote:

> I'll look in the litterature for the Broyden method, if I see the  
> whole algorithm, I think I'll be able to answer your questions ;)

Basically, and approximated Jacobian is used to determine the step  
direction and/or size (depending on the "step" module etc.)

The key to the Broyden approach is that the information about F(x+dx)  
is used to update the approximate Jacobian (think multidimensional  
secant method) in a clever way without any additional function  
evaluations (there is not a unique way to do this and some choices  
work better than others).

Thus, think of Broyden methods as a Quasi-Newton methods but with a  
cheap and very approximate Jacobian (hence, one usually uses a robust  
line search method to make sure that one is always descending).

> After thinking about this some more, I am beginning to like the idea
> that only the "function" object be responsible for the Jacobian.  If
> the function can compute the Jacobian directly: great, use a newton-
> like method.  If it can't, then do its best to approximate it (i.e.
> the "Broyden" part of the algorithm would be encoded in the function
> object rather than the step object."
>
>
> I think that if that if the function knows, on its own, how to  
> compute the Jacobian, the hessian, ... it should provide it. When  
> it does not, it shouldn't be the man sitting on the computer that  
> modifies its function to add a Broyden algorithm to the function  
> object. He sould only say to the optimizer that the function does  
> not compute the Jacobian by using another module. What module ?  
> That is a question for later. The goal of this is to have a clean  
> architecture, and adding a way to compute something directly in the  
> function, something that is not dependent on the function, but on  
> the step, is not a good thing.

My view is that the person sitting at the computer does one of the  
following things:

 >>> F1 = Function(f)
 >>> F2 = Function(f,opts)
 >>> F3 = Function(f,df,ddf,opts)
etc.

In this first case, the object F1 can compute f(x), and will use  
finite differences or some more complicated method to compute  
derivatives df(x) and ddf(x) if required by the optimization  
algorithm.  In F2, the user provides options that specify options  
about how to do these computations (for example, step size h, should  
a centred difference be used?  Perhaps the function is cheap and a  
Richardson extrapolation should be used for higher accuracy.  If f is  
analytic and supports complex arguments, then the difference step  
should be h=eps*1j.  Maybe f has been implemented using an automatic  
differentiation library etc.  Just throwing out ideas here...)

In the third case, the user has explicitly provided functions to  
compute the Jacobian etc. so these will be used (unless the user  
specifies otherwise).  In any case, all of the functors F1, F2 and F3  
can be passed to various "optimizers".  There would be a set of  
modules behind the interface provided by Function() that implement  
these various techniques for computing and/or estimating the  
derivatives, including the Broyden method.

The user sitting at the computer does nothing other than select from  
a set of options (opts) what methods he wants the library to use.   
Note, the user could pass explicit things to Function() too, like a  
custom function that computes numerical derivatives.

> The "function" object alone then serves up information about the
> value of the function at a given point, as well as the gradient and
> hessian at that point (either exact or approximate) to the criterion,
> step, and any other objects that need it.
>
> I'm OK with it as long as it is not an approximation algorithm that  
> is based on gradient, ... to compute for instance the hessian. Such  
> an approximation algorithm is generic, and as such it should be put  
> in another module or in a function superclass.

A Function "superclass" is what I had in mind.

> ...
> Certain termination criteria need access to the derivatives to make
> sure that they terminate.  It would query the function object for
> this information.  Other criteria may need to query the "step" object
> to find out the size of the previous steps.
>
> The step is not the good one, it's the line search object goal to  
> find the correct step size, and such intel is given back to the  
> optimizer core, because there, everything is saved - everything  
> should be saved with a call to recordHistory -. What could be done  
> is that every object - step or line search - returns along with the  
> result - the result being the step, the new candidate, ... - a  
> dictionnary with such values. In that case, the criterion can  
> choose what it needs directly inside it.

Yes, it seems that the optimizer should maintain information about  
the history.  The question I have is about the flow of information: I  
imagine that the criterion object should be able to query the  
optimization object for the information that it needs.  We should  
define an interface of things that the optimizer can serve up to the  
various components.  This interface can be extended as required to  
support more sophisticated algorithms.

>   The "criterion" should
> not maintain any of these internally, just rely on the values served
> by the other objects: this does not break the encapsulation, it just
> couples the objects more tightly, but sophisticated criteria need
> this coupling.

> For the moment, the state was not in the criterion, one cannot know  
> how any time it could be called inside an optimizer. This state is  
> maintained by the optimizer itself - contains the last 2 values,  
> the last 2 sets of parameters -, but I suppose that if we have the  
> new candidate, the step and its size, those can be removed, and so  
> the dictionary chooses what it needs.
>
>
> > I don't think so, the function provides methods to compute
> > gradient, hessian, ... but only the step object knows what to do
> > with it : approximate a hessian, what was already approximated, ...
> > A step object is associated with one optimizer, a function object
> > can be optimized several times. If it has a state, it couldn't be
> > used with several optimizers without reinitializing it, and it is
> > not intuitive enough.
>
> The "function" object maintains all the information known about the
> function: how to compute the function, how to compute/approximate
> derivatives etc.  If the user does not supply code for directly
> computing derivatives, but wants to use an optimization method that
> makes use of gradient information, then the function object should do
> its best to provide approximate information.  The essence behind the
> Broyden methods is to approximate the Jacobian information in a
> clever and cheap way.
>
> That would mean that it can have a state, I really do not support  
> this approach. The Broyden _is_ a way to get a step from a function  
> that does not give some intell - Jacobian, for instance -, so it is  
> not a function thing, it is a step mode.

I disagree.  I think of the Broyden algorithm as a way of maintaining  
the Jacobian.  The way to get the step is independent of this, though  
it may use the Jacobian information to help it.  The Broyden part of  
the algorithm is solely to approximate the Jacobian cheaply.

> ...
> Please describe how you think the Broyden root-finding method would
> fit within this scheme.  Which object would maintain the state of the
> approximate Jacobian?
>
> No problem, I'll check in my two optimization books this evening  
> provided I have enough time - I'm a little late in some important  
> work projects :| -

Maybe I will see things differently when you do this, but I am pretty  
convinced right now that the Function() object is the best place for  
the Broyden part of the algorithm.

Michael.

P.S. No hurry.  I might also disappear from time to time when busy;-)