[SciPy-dev] Proposal for more generic optimizers (posted before on scipy-user)

Mon Apr 16 15:27:23 EDT 2007

>
> Basically, and approximated Jacobian is used to determine the step
> direction and/or size (depending on the "step" module etc.)
>
> The key to the Broyden approach is that the information about F(x+dx)
> is used to update the approximate Jacobian (think multidimensional
> secant method) in a clever way without any additional function
> evaluations (there is not a unique way to do this and some choices
> work better than others).
>
> Thus, think of Broyden methods as a Quasi-Newton methods but with a
> cheap and very approximate Jacobian (hence, one usually uses a robust
> line search method to make sure that one is always descending).

I read some doc, it seems Broyden is a class of different steps, am I wrong
? and it tries to approximate the Hessian of the function.

My view is that the person sitting at the computer does one of the
> following things:
>
> >>> F1 = Function(f)
> >>> F2 = Function(f,opts)
> >>> F3 = Function(f,df,ddf,opts)
> etc.

OK, that is not what a function is :)
A function is the set of f, df, ddf but not with the options. What you are
exposing is the construction of an optimizer ;)

Did you see the code in my different proposals ?
In fact, you have a function class - for instance Rosenbrock class - that
defines several methods, like gradient, hessian, ... without a real state -
a real state being something other than for instead the number of dimension
for the rosenbrock function, the points that need to be approximated, ... a
real state is something that is dependent of the subsequent calls to the
functor, the gradient, ... - so that this function can be reused
efficiently.
Then you use an instance of this class to be optimized.
You choose your step mode with its parameters like gradient, conjugate
gradient, Newton, Quasi-Newton, ...
You choose your line search with its own parameters - tolerance, ... - like
section methods, interpolation methods, ...
Actually, you choose your stopping criterion.
Then you make something like :
optimizer = StandardOptimizer(function = myFunction,  step = myStep,
.......)
optimizer->optimize()

That is a modular design, and that is why some basic functions must be
provided so that people that don't care about the underlying design really
do not have to care. Then if someone wants a specific, non-standard
optimizer, one just have to select the wanted modules - for instance, a
conjugate-gradient with a golden-section line search and a relative value
criterion onstead of a fibonacci search and an absolute criterion -.

It can be more cumbersome at the start, but once some modules are made,
assembling them will be more easy, and tests will be more fun :)

In this first case, the object F1 can compute f(x), and will use
> finite differences or some more complicated method to compute
> derivatives df(x) and ddf(x) if required by the optimization
> algorithm.  In F2, the user provides options that specify options
> about how to do these computations (for example, step size h, should
> a centred difference be used?  Perhaps the function is cheap and a
> Richardson extrapolation should be used for higher accuracy.  If f is
> analytic and supports complex arguments, then the difference step
> should be h=eps*1j.  Maybe f has been implemented using an automatic
> differentiation library etc.  Just throwing out ideas here...)

Yes, that's exactly an optimizer, not a function ;)

A Function "superclass" is what I had in mind.

As I said, that would make the function a state function, so this instance
must not be shared among optimizers, so it is more error prone :(

Yes, it seems that the optimizer should maintain information about
> the history.  The question I have is about the flow of information: I
> imagine that the criterion object should be able to query the
> optimization object for the information that it needs.  We should
> define an interface of things that the optimizer can serve up to the
> various components.  This interface can be extended as required to
> support more sophisticated algorithms.

If each module returns a tuple with the result and a set of parameters used
and if the criterion gets all these sets in a dictionary, it will be able to
cope with more specific cases of steps or line searches.
For the communication between the optimizer and the modules, the only
communication is 'I want this and this' from the optimizer, and the rest is
defined when instantiating the different modules. For instance, the line
search doesn't need to know how the step is computed, and in fact neither
does the optimizer. The step knows the function; knows what it needs from
it, if it is not provided, the optimization should fail.

> That would mean that it can have a state, I really do not support
> > this approach. The Broyden _is_ a way to get a step from a function
> > that does not give some intell - Jacobian, for instance -, so it is
> > not a function thing, it is a step mode.
>
> I disagree.  I think of the Broyden algorithm as a way of maintaining
> the Jacobian.  The way to get the step is independent of this, though
> it may use the Jacobian information to help it.  The Broyden part of
> the algorithm is solely to approximate the Jacobian cheaply.

Broyden algorithm is a step, the litterature states it as a step, nothing
else. Think of 3D computer graphism. Some graphic cards - function - provide
some functions, other don't. When they are needed, the driver - the step or
the line search - provides an software approach - an approximation -, but do
not modify the GPU to add the functions.
Here, it is exactly the same, a step is an adapter to provide a step, but
never modifies the function.

Maybe I will see things differently when you do this, but I am pretty
> convinced right now that the Function() object is the best place for
> the Broyden part of the algorithm.

And now ? :)
For instance, the conjugate gradient must remember the last step. It is
exactly like Broyden algorithm, it is only simplier, but it has a state. If
the last gradient were to be stored inside the function and if this function
were to be optimized again with another starting point, the first step would
be wrong... If I have some time, I'll program the FR conjugate-gradient step
so that you can see how it is made ;)

Michael.
>
> P.S. No hurry.  I might also disappear from time to time when busy;-)

Me too ;)

Matthieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20070416/71128c54/attachment.html>