[SciPy-dev] scipy.optimize.nonlin rewrite

Mon Dec 8 12:43:47 EST 2008

Hi Pauli and all!

First let me apologize for taking me so long to reply. I wrote this
code in the first place and I am happy that Pauli has rewritten it. I
agree with the general direction, but I think this change should not
go into 0.7.0, as it changes the interface and it is not well tested
yet. Also, you renamed all the working broyden implementations that I
use as BadBroyden, so I suggest to name them GoodBroyden, more below.

On Sat, Nov 29, 2008 at 8:40 PM, Pauli Virtanen <pav at iki.fi> wrote:
> Hi all,
>
> I spent some time rewriting the scipy.optimize.nonlin module:
>
>    http://github.com/pv/scipy/tree/ticket-791/scipy/optimize/nonlin.py
>
> The following things changed:
>
> - Support tolerance-based stopping conditions (cf. ticket #791)

+1

>
> - Improved handling of input and return values from the functions,
>  so that they are now easier to use.

+1

>
> - Don't use np.matrix at all.

+1

>
> - Jacobian approximations factored into classes; the iteration code is
>  now in only one place, so trust-region handling or other improvements
>  can be added to a single place later on.

+1

>
> - There's now a line search to the direction inverting the Jacobian
>  gave. (But there's no checking that it is an actual descent direction
>  for some merit function.)

+1

I would like this to test this on my codes how it behaves. If you
tested on your codes and as long as one can turn this off (as one
can), let's commit this and then improve.

>
> - Rewrote docstrings.

in general +1, but I don't understand what all the implementations
actually do. I suggest to also port my comments from the module
docstring into the respective classes. I can help you with that, after
we agree on other things below.

>
> The routines should produce the same output as previously. The tests are
> still not very strong, however.
>
>
> But I have now some questions:
>
> * Getting this to 0.7.0; this is a complete rewrite, so is it too late,
>  and is it better to wait for 0.8?

Yes, I think it should go into 0.8.

>
> * Some of the algorithms in there don't appear to work too well, and
>  some appear to be redundant. I'd like to clean up this a bit, leaving
>  only known-good stuff in.

+1

>
> * I'd like to remove `broyden2` as the actual Jacobian approximation in
>  this appears to be the same as in `broyden3`, and there does not
>  appear to be much difference in the work involved in the two.
>
>  Ondrej, since you wrote the original code, do you think there is
>  a reason to keep both?

I think it is, at least for tutorial purposes and also as an easy way
to check that all is fine. Because there may be some numericall
differences, also for a lot of iterations, the memory consumption of
broyden3 is not limited (is it?). Broyden2 just stores the NxN dense
matrix (that can of course be big), but that's it.

>
> * `broyden1_modified` appears to be, in the end if you work out the
>  matrix algebra, updating the inverse Jacobian in a way that
>  corresponds to
>
>      J := J + (y - J s / 2) s^T / ||s||^2
>
>  for the Jacobian (with s = dx, y = df). Apart from the factor
>  1/2, it's Broyden's good method. [1] One can also verify that the
>  updated inverse Jacobian does not satisfy the quasi-Newton condition
>  s = J^{-1} y, and that `broyden1_modified` doesn't generate the same
>  sequence as `broyden1`.
>
>  Hence, I'd like to remove this routine, unless there's some literature
>  that shows that the above works better than Broyden's method; Ondrej,
>  do you agree?

I agree.

>
>  .. [1] http://en.wikipedia.org/wiki/Broyden%27s_method
>         http://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula
>
> * Also, which articles were used as reference for the non-Quasi-Newton
>  algorithms:
>
>  - `broyden_modified`. This appears to be a bit exotic, and it uses
>    several magic constants (`w0`, `wl`) whose meaning is not clear to
>    me.
>
>    A reference would be helpful here, also for the user who has to
>    choose the parameters as appropriate for his/her specific problem.

You can use my master thesis:

http://ondrej.certik.cz/site_media/cookbook/master.pdf

pages 27-31. Everything is explained in there, plus references given
to the original literature.

>
>  - `broyden_generalized`, `anderson`, `anderson2`. These appear to be
>    variants of Anderson mixing, so probably we only want at most
>    one of these. Also, broyden_generalized() == anderson(w0=0), am I
>    correct?

Yes and no, see my master thesis for details and references. There are
some subtle differences. But those methods have never worked for me
well.

>
>    `anderson` and `anderson2` don't appear to function equivalently,
>    and I suspect the update formula in the latter is wrong, since this
>    algorithm can't solve any of the test problems. Do you have a
>    reference for this?

Yes, see my master thesis, pages above, it's the reference [4].

>
>    Is there a rule how `w0` should be chosen for some specific problem?

Yes, page 31 of my master thesis. But broyden2 or 3, or excitingmixing
works much better for my problems.

>
>  - `excitingmixing`. A reference would be useful, to clarify the
>    heuristics used.

master thesis, equation 3.28. It's the mixing used in the code exciting:

http://exciting.sourceforge.net/

I just noticed I forgot to put a reference to that code into my thesis. :(

>
>  - `linearmixing`. I'm not sure that Scipy should advocate this
>    method :)

Please don't remove this. Sometimes it's the method that works the
best in electronic structure calculations, because broyden2 and 3
sometimes fail. Also it's good to check that it converges to the same
thing as broyden (or other) methods.

>
>  Linearmixing and excitingmixing also seem to require something from
>  the objective function, possibly that the eigenvalues of its Jacobian
>  have the opposite sign than `alpha`. For example, neither of them can
>  find a solution for the equation ``I x = 0`` where ``I`` is the
>  identity matrix (test function F2). So, I'm a bit tempted to remove
>  also these routines, as it seems they probably are not too useful for
>  general problems.

Please don't removed those routines, they are essential for electronic
structure calculations, as they are very robust, fast and doesn't
require any memory. It just works for many problems.

>
> * The code in there is still a bit naive implementation of the inexact
>  (Quasi-)Newton method, and one could add eg. add trust-region handling
>  or try to guarantee that the line search direction is always a
>  decrease direction for a merit function. (I don't know if it's
>  possible to do the latter unless one has a matrix representation of
>  the Jacobian approximation.) So, I suspect there are problems for
>  which eg. MINPACK code will find solutions, but for which the
>  nonlin.py code fails.

Yes, maybe they could be improved. nonlin.py is made for cases, where
you cannot afford to store the full dense Jacobian NxN matrix in the
memory.

>
> * One could add more algorithms suitable for large-scale problems; for
>  example some limited-memory Broyden methods (eg. [1])

I think in the references in my thesis, the broyden3 is sometimes
called the limited-memory Broyden method.
However, I checked the equations in [1] in your email and they need to
do a singular-value-decomposition, so it seems there is yet another
approach to this. So yes.

> or Secant-Krylov
>  methods [2].

Indeed, that would be very helpful.

>
>  .. [1] http://www.math.leidenuniv.nl/~verduyn/publications/reports/equadiff.ps
>  .. [2] D. A. Knoll and D. E. Keyes. Jacobian free Newton-Krylov methods.
>         Journal of Computational Physics, 20(2):357–397, 2004.
>
>  I have implementations for both types of algorithms that could
>  possibly go in after some polishing.

Excellent, looking forward! I'll test it on my problems in electronic
structure if that works.

Let me know what you think about my comments. Basically, scipy should
have as many (working and well tested) algorithms as possible, with
the same interface, so that one can change the algorithms and see
which works for the particular problem. I am glad that you are also
using those methods, so that we can work on this together.

Ondrej