[SciPy-dev] scipy.optimize.nonlin rewrite

Wed Dec 17 16:43:07 EST 2008

On Wed, Dec 17, 2008 at 9:55 PM, Pauli Virtanen <pav at iki.fi> wrote:
> Hi,
>
> Sorry for the delay -- busy times...
>
> First, a short summary of what I think it would be good to:
>
> - Remove broyden1_modified
>
> - Remove broyden_generalized; the code is identical to anderson
>  with w0=0.
>
> - Remove broyden_modified; if I understand Eyert [1] correctly, this method
>  is expected to be always inferior to (extended) Anderson mixing. So I don't
>  see a real need for retaining this in Scipy. Also, I don't know how to check
>  that it works correctly.
>
> - Remove anderson2, because it does not work for any of the test problems.
>  Since `anderson` seems to work, I don't think it's wise to try to get
>  this second implementation up to speed.
>
> - Change broyden1 to update the inverse Jacobian using Sherman-Morrison.
>  No need for inverting J on every iteration.
>
> - Convert broyden3 to a simple limited-memory method.
>  Rename it, too.
>
> - Change the API so that there's a single nonlinear solver function,
>  which takes an option that determines which Jacobian approximation to use.
>
> - Maybe add a reverse-communication interface, see below.
>
> .. [1] V. Eyert, J. Comp. Phys, 124, 271 (1996).
>
>
> Here comes the rest:
>
> Mon, 08 Dec 2008 18:43:47 +0100, Ondrej Certik wrote:
>> On Sat, Nov 29, 2008 at 8:40 PM, Pauli Virtanen <pav at iki.fi> wrote:
> [clip]
>> First let me apologize for taking me so long to reply. I wrote this code
>> in the first place and I am happy that Pauli has rewritten it. I agree
>> with the general direction, but I think this change should not go into
>> 0.7.0, as it changes the interface and it is not well tested yet. Also,
>> you renamed all the working broyden implementations that I use as
>> BadBroyden, so I suggest to name them GoodBroyden, more below.
>
> This is probably not going to be in 0.7.0, even though the release was
> postponed. The API might still need a bit of thinking, and it's probably best
> to test this thoroughly.
>
> The difference in Broyden's good/first and bad/second method is to my
> understanding the one mentioned in Wikipedia: both methods can be written as
> updates to the inverse Jacobian because the update is rank-1 so that the
> Sherman-Morrison formula applies. The difference can be seen if one looks at
> the domains (right-hand vector) of the rank-1 inverse updates: Broyden's "good"
> method uses J_{n-1}^{-T} dx, whereas the "bad" method uses df. These two
> vectors are not equal, even though both updates satisfy the quasi-Newton
> condition.
>
> [clip]
>>> - Rewrote docstrings.
>>
>> in general +1, but I don't understand what all the implementations
>> actually do. I suggest to also port my comments from the module
>> docstring into the respective classes. I can help you with that, after
>> we agree on other things below.
>
> I'm not sure how much we can say about relative efficiency of the various
> methods. We can cite some references on this, but I'd rather not say much
> based on a couple of test problems.
>
> [clip]
>>> * I'd like to remove `broyden2` as the actual Jacobian approximation in
>>>  this appears to be the same as in `broyden3`, and there does not
>>>  appear to be much difference in the work involved in the two.
>>>
>>>  Ondrej, since you wrote the original code, do you think there is a
>>>  reason to keep both?
>>
>> I think it is, at least for tutorial purposes and also as an easy way to
>> check that all is fine. Because there may be some numericall
>> differences, also for a lot of iterations, the memory consumption of
>> broyden3 is not limited (is it?). Broyden2 just stores the NxN dense
>> matrix (that can of course be big), but that's it.
>
> Well, I think having two implementations of the same method does not really
> make sense, and scipy.optimize should contain only production-quality code.
>
> But yes, the memory consumption of broyden3 and broyden2 are not exactly the
> same, although neither is a real limited-memory method. See below.
>
> [clip]
>>> * Also, which articles were used as reference for the non-Quasi-Newton
>>>  algorithms:
>>>
>>>  - `broyden_modified`. This appears to be a bit exotic, and it uses
>>>    several magic constants (`w0`, `wl`) whose meaning is not clear to
>>>    me.
>>>
>>>    A reference would be helpful here, also for the user who has to
>>>    choose the parameters as appropriate for his/her specific problem.
>>
>> You can use my master thesis:
>>
>> http://ondrej.certik.cz/site_media/cookbook/master.pdf
>>
>> pages 27-31. Everything is explained in there, plus references given to
>> the original literature.
>
> Looking at Eyert's paper, it seems to me that there's no need to retain
> this method.
>
>>>  - `broyden_generalized`, `anderson`, `anderson2`. These appear to be
>>>    variants of Anderson mixing, so probably we only want at most one of
>>>    these. Also, broyden_generalized() == anderson(w0=0), am I correct?
>>
>> Yes and no, see my master thesis for details and references. There are
>> some subtle differences. But those methods have never worked for me
>> well.
>>
>>>    `anderson` and `anderson2` don't appear to function equivalently,
>>>    and I suspect the update formula in the latter is wrong, since this
>>>    algorithm can't solve any of the test problems. Do you have a
>>>    reference for this?
>>
>> Yes, see my master thesis, pages above, it's the reference [4].
>
> Ok, anderson2 and anderson should be exactly equivalent. This probably means
> that there's a bug in anderson2, or maybe the default parameters are somehow
> strange. Of the tree, I'd retain only `anderson`.
>
> [clip]
>>>  - `excitingmixing`. A reference would be useful, to clarify the
>>>    heuristics used.
> [clip]
>>>  - `linearmixing`. I'm not sure that Scipy should advocate this
>>>    method :)
> [clip]
>>>  Linearmixing and excitingmixing also seem to require something from
>>>  the objective function, possibly that the eigenvalues of its Jacobian
>>>  have the opposite sign than `alpha`. For example, neither of them can
>>>  find a solution for the equation ``I x = 0`` where ``I`` is the
>>>  identity matrix (test function F2). So, I'm a bit tempted to remove
>>>  also these routines, as it seems they probably are not too useful for
>>>  general problems.
>>
>> Please don't removed those routines, they are essential for electronic
>> structure calculations, as they are very robust, fast and doesn't
>> require any memory. It just works for many problems.
>
> These are pretty straightforward methods, so maybe they can stay.  But the
> docstrings should clearly say that these are intended only for specialized
> problems.
>
> I wonder if electronic structure (DFT) calculations are a bit of a special
> case. Ie. are the eigenvalues of the Jacobian of the Kohn-Sham functional F[n]
> all negative (or less than 1)?
>
> [clip]
>>> * One could add more algorithms suitable for large-scale problems; for
>>>  example some limited-memory Broyden methods (eg. [1])
>>
>> I think in the references in my thesis, the broyden3 is sometimes called
>> the limited-memory Broyden method. However, I checked the equations in
>> [1] in your email and they need to do a singular-value-decomposition, so
>> it seems there is yet another approach to this. So yes.
>
> It's a part of a limited-memory method, but not the whole story, because the
> present implementation does not actually throw any data away.  So, it's
> an "unlimited memory" Broyden method :)
>
> But it would be very easy (= 3 lines of code) to make it a simple
> limited-memory method, just by throwing away iterates older than some M.  The
> singular value reduction can be implemented later. There are probably many
> different schemes proposed in the literature how the reduction can be done...
>
> [clip]
>> Let me know what you think about my comments. Basically, scipy should
>> have as many (working and well tested) algorithms as possible, with the
>> same interface, so that one can change the algorithms and see which
>> works for the particular problem.
>
> Yes, but this must be balanced against maintenance + testing costs.  I want to
> be sure that the code I contribute works, and I wouldn't like to spend too much
> time working on unproven methods. So I'd rather include only methods for which
> there is reason to believe that they work well.

Yes, the only thing that I really care is that the methods that are
known to work well for me are there. And I am willing to maintain
them.

>
>> I am glad that you are also using those methods, so that we can work
>> on this together.
>
> It'd be very useful if you could test this code, including the line search,
> on a real large-scale problem. At present, I'm not sure if it makes too many
> iterations compared to its payoff.
>
> Also, we might need to implement a reverse communication interface,
> something on the lines of
>
>    iterator = Solver(x0)
>    for x in iterator:
>        iterator.feed(func(x))
>        if iterator.converged(): break
>    else:
>        raise RuntimeError("Didn't converge")
>
> For expensive self-consistent iterations, this often feels the natural way
> to use the nonlinear solver.

Ok, I'll test it. So far I only have a one atomic solver, so that's
not a large scale problem, but if I put for example 20000 points in
the radial mesh, then I can test the nonlin methods pretty well (e.g.
Jacobian being 20000x20000 if we stored it fully, e.g. something like
1.5GB).

Right now I am working on a 2D solver and here it will show clearly
which method works the best. You can follow my progress here:

http://spilka.math.unr.edu/projects/hermes2d/wiki/Projects/SchroedingerEquation

So since you'll be busy in the next couple weeks, I think I'll know
soon which methods I need and which not. Thanks for all the comments
above -- I'll think about it and probably ask you soon to help me
implement or at least test some of those methods. :)

Thanks,
Ondrej