[SciPy-dev] Proposal for more generic optimizers (posted before on scipy-user)

Mon Apr 16 11:39:08 EDT 2007

On 14 Apr 2007, at 1:51 PM, Matthieu Brucher wrote:
>
> One question about coupling: Useful criteria for globally convergent
> algorithms include testing the gradients and/or curvature of the
> function.
>
> Simple, the criterion takes, for the moment, the current iteration  
> number, the former value, the current value, same for the  
> parameters. It can be modified to add the gradient if needed - I  
> think the step would be a better choice ?
>
>   In the Broyden algorithm, for example, these would be
> maintained by the "step" object,
>
> but the criteria object would need
> to access these.
>
> Access what exactly ?

"these == gradient/hessian information"

The criterion needs access to this information, but the question is:  
who serves it?  If the "function" can compute these, then it should  
naturally serve this information.  With the Broyden method, you  
suggest that the "step" would serve this information.  Thus, there  
are two objects (depending on the choice of method) that maintain and  
provide gradient information.

After thinking about this some more, I am beginning to like the idea  
that only the "function" object be responsible for the Jacobian.  If  
the function can compute the Jacobian directly: great, use a newton- 
like method.  If it can't, then do its best to approximate it (i.e.  
the "Broyden" part of the algorithm would be encoded in the function  
object rather than the step object."

The "function" object alone then serves up information about the  
value of the function at a given point, as well as the gradient and  
hessian at that point (either exact or approximate) to the criterion,  
step, and any other objects that need it.

>   Likewise, if a "function" object can compute its
> own derivatives, then the "ceriterion" object should access it from
> there.
>
>
> I don't think that the criterion need to access this, because it  
> would mean it knows more than it should, from an object-oriented  
> point of view, but this can be discussed :)

Certain termination criteria need access to the derivatives to make  
sure that they terminate.  It would query the function object for  
this information.  Other criteria may need to query the "step" object  
to find out the size of the previous steps.  The "criterion" should  
not maintain any of these internally, just rely on the values served  
by the other objects: this does not break the encapsulation, it just  
couples the objects more tightly, but sophisticated criteria need  
this coupling.

> Any ideas on how to deal with these couplings?  Perhaps the
> "function" object should maintain all the state (approximate
> jacobians etc.).
>
> I don't think so, the function provides methods to compute  
> gradient, hessian, ... but only the step object knows what to do  
> with it : approximate a hessian, what was already approximated, ...  
> A step object is associated with one optimizer, a function object  
> can be optimized several times. If it has a state, it couldn't be  
> used with several optimizers without reinitializing it, and it is  
> not intuitive enough.

The "function" object maintains all the information known about the  
function: how to compute the function, how to compute/approximate  
derivatives etc.  If the user does not supply code for directly  
computing derivatives, but wants to use an optimization method that  
makes use of gradient information, then the function object should do  
its best to provide approximate information.  The essence behind the  
Broyden methods is to approximate the Jacobian information in a  
clever and cheap way.

I really think the natural place for this is in the "function"  
object, not the "step".

> I've thinking about a correct architecture for several months now,  
> and that is what I think is a good one :
> - a function to optimize that provides some method to compute the  
> cost, the gradient, hessian, ... only basic stuff
> - an object that is responsible for the optimization, the glue  
> between all modules -> optimizer
> - an object that tells if the optimization has converged. It needs  
> the current iteration number, several last values, parameters,  
> perhaps other things, but these things should be discussed
> - an object that computes a new step, takes a function to optimize,  
> can have a state - to compute approximate hessian or inverse hessian
> - a line search that can find a new candidate - section method,  
> damped method, no method at all, with a state (Marquardt), ...
>
> With these five objects, I _think_ every unconstrained method can  
> be expressed. For the constraints, I suppose the step and the line  
> search should be adapted, but no other module needs to be touched.

Please describe how you think the Broyden root-finding method would  
fit within this scheme.  Which object would maintain the state of the  
approximate Jacobian?

Michael.