[SciPy-dev] Proposal for more generic optimizers (posted before on scipy-user)
Michael McNeil Forbes
mforbes at physics.ubc.ca
Mon Apr 16 14:47:59 EDT 2007
On 16 Apr 2007, at 9:07 AM, Matthieu Brucher wrote:
> I'll look in the litterature for the Broyden method, if I see the
> whole algorithm, I think I'll be able to answer your questions ;)
Basically, and approximated Jacobian is used to determine the step
direction and/or size (depending on the "step" module etc.)
The key to the Broyden approach is that the information about F(x+dx)
is used to update the approximate Jacobian (think multidimensional
secant method) in a clever way without any additional function
evaluations (there is not a unique way to do this and some choices
work better than others).
Thus, think of Broyden methods as a Quasi-Newton methods but with a
cheap and very approximate Jacobian (hence, one usually uses a robust
line search method to make sure that one is always descending).
> After thinking about this some more, I am beginning to like the idea
> that only the "function" object be responsible for the Jacobian. If
> the function can compute the Jacobian directly: great, use a newton-
> like method. If it can't, then do its best to approximate it (i.e.
> the "Broyden" part of the algorithm would be encoded in the function
> object rather than the step object."
>
>
> I think that if that if the function knows, on its own, how to
> compute the Jacobian, the hessian, ... it should provide it. When
> it does not, it shouldn't be the man sitting on the computer that
> modifies its function to add a Broyden algorithm to the function
> object. He sould only say to the optimizer that the function does
> not compute the Jacobian by using another module. What module ?
> That is a question for later. The goal of this is to have a clean
> architecture, and adding a way to compute something directly in the
> function, something that is not dependent on the function, but on
> the step, is not a good thing.
My view is that the person sitting at the computer does one of the
following things:
>>> F1 = Function(f)
>>> F2 = Function(f,opts)
>>> F3 = Function(f,df,ddf,opts)
etc.
In this first case, the object F1 can compute f(x), and will use
finite differences or some more complicated method to compute
derivatives df(x) and ddf(x) if required by the optimization
algorithm. In F2, the user provides options that specify options
about how to do these computations (for example, step size h, should
a centred difference be used? Perhaps the function is cheap and a
Richardson extrapolation should be used for higher accuracy. If f is
analytic and supports complex arguments, then the difference step
should be h=eps*1j. Maybe f has been implemented using an automatic
differentiation library etc. Just throwing out ideas here...)
In the third case, the user has explicitly provided functions to
compute the Jacobian etc. so these will be used (unless the user
specifies otherwise). In any case, all of the functors F1, F2 and F3
can be passed to various "optimizers". There would be a set of
modules behind the interface provided by Function() that implement
these various techniques for computing and/or estimating the
derivatives, including the Broyden method.
The user sitting at the computer does nothing other than select from
a set of options (opts) what methods he wants the library to use.
Note, the user could pass explicit things to Function() too, like a
custom function that computes numerical derivatives.
> The "function" object alone then serves up information about the
> value of the function at a given point, as well as the gradient and
> hessian at that point (either exact or approximate) to the criterion,
> step, and any other objects that need it.
>
> I'm OK with it as long as it is not an approximation algorithm that
> is based on gradient, ... to compute for instance the hessian. Such
> an approximation algorithm is generic, and as such it should be put
> in another module or in a function superclass.
A Function "superclass" is what I had in mind.
> ...
> Certain termination criteria need access to the derivatives to make
> sure that they terminate. It would query the function object for
> this information. Other criteria may need to query the "step" object
> to find out the size of the previous steps.
>
> The step is not the good one, it's the line search object goal to
> find the correct step size, and such intel is given back to the
> optimizer core, because there, everything is saved - everything
> should be saved with a call to recordHistory -. What could be done
> is that every object - step or line search - returns along with the
> result - the result being the step, the new candidate, ... - a
> dictionnary with such values. In that case, the criterion can
> choose what it needs directly inside it.
Yes, it seems that the optimizer should maintain information about
the history. The question I have is about the flow of information: I
imagine that the criterion object should be able to query the
optimization object for the information that it needs. We should
define an interface of things that the optimizer can serve up to the
various components. This interface can be extended as required to
support more sophisticated algorithms.
> The "criterion" should
> not maintain any of these internally, just rely on the values served
> by the other objects: this does not break the encapsulation, it just
> couples the objects more tightly, but sophisticated criteria need
> this coupling.
> For the moment, the state was not in the criterion, one cannot know
> how any time it could be called inside an optimizer. This state is
> maintained by the optimizer itself - contains the last 2 values,
> the last 2 sets of parameters -, but I suppose that if we have the
> new candidate, the step and its size, those can be removed, and so
> the dictionary chooses what it needs.
>
>
> > I don't think so, the function provides methods to compute
> > gradient, hessian, ... but only the step object knows what to do
> > with it : approximate a hessian, what was already approximated, ...
> > A step object is associated with one optimizer, a function object
> > can be optimized several times. If it has a state, it couldn't be
> > used with several optimizers without reinitializing it, and it is
> > not intuitive enough.
>
> The "function" object maintains all the information known about the
> function: how to compute the function, how to compute/approximate
> derivatives etc. If the user does not supply code for directly
> computing derivatives, but wants to use an optimization method that
> makes use of gradient information, then the function object should do
> its best to provide approximate information. The essence behind the
> Broyden methods is to approximate the Jacobian information in a
> clever and cheap way.
>
> That would mean that it can have a state, I really do not support
> this approach. The Broyden _is_ a way to get a step from a function
> that does not give some intell - Jacobian, for instance -, so it is
> not a function thing, it is a step mode.
I disagree. I think of the Broyden algorithm as a way of maintaining
the Jacobian. The way to get the step is independent of this, though
it may use the Jacobian information to help it. The Broyden part of
the algorithm is solely to approximate the Jacobian cheaply.
> ...
> Please describe how you think the Broyden root-finding method would
> fit within this scheme. Which object would maintain the state of the
> approximate Jacobian?
>
> No problem, I'll check in my two optimization books this evening
> provided I have enough time - I'm a little late in some important
> work projects :| -
Maybe I will see things differently when you do this, but I am pretty
convinced right now that the Function() object is the best place for
the Broyden part of the algorithm.
Michael.
P.S. No hurry. I might also disappear from time to time when busy;-)
More information about the SciPy-Dev
mailing list