[SciPy-dev] Proposal for more generic optimizers (posted before on scipy-user)

Mon Apr 16 12:07:44 EDT 2007

I suspected it would become more problematic to decouple everything, but not
that soon :)

"these == gradient/hessian information"
>
> The criterion needs access to this information, but the question is:
> who serves it?  If the "function" can compute these, then it should
> naturally serve this information.  With the Broyden method, you
> suggest that the "step" would serve this information.  Thus, there
> are two objects (depending on the choice of method) that maintain and
> provide gradient information.

I'll look in the litterature for the Broyden method, if I see the whole
algorithm, I think I'll be able to answer your questions ;)

After thinking about this some more, I am beginning to like the idea
> that only the "function" object be responsible for the Jacobian.  If
> the function can compute the Jacobian directly: great, use a newton-
> like method.  If it can't, then do its best to approximate it (i.e.
> the "Broyden" part of the algorithm would be encoded in the function
> object rather than the step object."

I think that if that if the function knows, on its own, how to compute the
Jacobian, the hessian, ... it should provide it. When it does not, it
shouldn't be the man sitting on the computer that modifies its function to
add a Broyden algorithm to the function object. He sould only say to the
optimizer that the function does not compute the Jacobian by using another
module. What module ? That is a question for later. The goal of this is to
have a clean architecture, and adding a way to compute something directly in
the function, something that is not dependent on the function, but on the
step, is not a good thing.

The "function" object alone then serves up information about the
> value of the function at a given point, as well as the gradient and
> hessian at that point (either exact or approximate) to the criterion,
> step, and any other objects that need it.

I'm OK with it as long as it is not an approximation algorithm that is based
on gradient, ... to compute for instance the hessian. Such an approximation
algorithm is generic, and as such it should be put in another module or in a
function superclass.

> I don't think that the criterion need to access this, because it
> > would mean it knows more than it should, from an object-oriented
> > point of view, but this can be discussed :)
>
> Certain termination criteria need access to the derivatives to make
> sure that they terminate.  It would query the function object for
> this information.  Other criteria may need to query the "step" object
> to find out the size of the previous steps.

The step is not the good one, it's the line search object goal to find the
correct step size, and such intel is given back to the optimizer core,
because there, everything is saved - everything should be saved with a call
to recordHistory -. What could be done is that every object - step or line
search - returns along with the result - the result being the step, the new
candidate, ... - a dictionnary with such values. In that case, the criterion
can choose what it needs directly inside it.

  The "criterion" should
> not maintain any of these internally, just rely on the values served
> by the other objects: this does not break the encapsulation, it just
> couples the objects more tightly, but sophisticated criteria need
> this coupling.

For the moment, the state was not in the criterion, one cannot know how any
time it could be called inside an optimizer. This state is maintained by the
optimizer itself - contains the last 2 values, the last 2 sets of parameters
-, but I suppose that if we have the new candidate, the step and its size,
those can be removed, and so the dictionary chooses what it needs.

> I don't think so, the function provides methods to compute
> > gradient, hessian, ... but only the step object knows what to do
> > with it : approximate a hessian, what was already approximated, ...
> > A step object is associated with one optimizer, a function object
> > can be optimized several times. If it has a state, it couldn't be
> > used with several optimizers without reinitializing it, and it is
> > not intuitive enough.
>
> The "function" object maintains all the information known about the
> function: how to compute the function, how to compute/approximate
> derivatives etc.  If the user does not supply code for directly
> computing derivatives, but wants to use an optimization method that
> makes use of gradient information, then the function object should do
> its best to provide approximate information.  The essence behind the
> Broyden methods is to approximate the Jacobian information in a
> clever and cheap way.

That would mean that it can have a state, I really do not support this
approach. The Broyden _is_ a way to get a step from a function that does not
give some intell - Jacobian, for instance -, so it is not a function thing,
it is a step mode.

I really think the natural place for this is in the "function"
> object, not the "step".
>
> > With these five objects, I _think_ every unconstrained method can
> > be expressed. For the constraints, I suppose the step and the line
> > search should be adapted, but no other module needs to be touched.
>
> Please describe how you think the Broyden root-finding method would
> fit within this scheme.  Which object would maintain the state of the
> approximate Jacobian?

No problem, I'll check in my two optimization books this evening provided I
have enough time - I'm a little late in some important work projects :| -
Thanks for your patience and your will to create generic optimizers :)

Matthieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20070416/81098c59/attachment.html>