[SciPy-Dev] New Tutorial on Optimize Help

Tue Oct 8 09:03:39 EDT 2019

On Mon, Oct 7, 2019 at 6:22 AM Matt Newville <newville at cars.uchicago.edu>
wrote:

> Hi Christina,
>
> On Mon, Sep 30, 2019 at 6:43 PM Christina Lee <chrissie.c.l at gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm a SciPy technical writer and am currently rewriting the
>> scipy.optimize tutorial, focusing on `minimize` right now.  While I've
>> gotten a grasp of the "how", I still want to explain "why". Why choose
>> one option over another? I could use information from those with more
>> experience.
>>
>
> Thank you very much for taking this on -- this is heroic work.  I would
> make no claims about what "most users" use these methods for, and doubt
> anyone else really knows this either.   In my experience (maintainer of the
> lmfit library), it seems that many people using `scipy.optimize` are
> actually trying to solve curve-fitting problems.
>
> Focusing on the group of methods wrapped by `minimize()` is a fine
> starting point, but you may also want to consider explaining why the user
> would choose `minimize()` over other approaches.  That is, describe what
> sort of problems `minimize()` is most suitable for, which it can be useful
> for but might not be the only approach, and for which it is not the most
> suitable.
>
> The methods of `minimize()` are all scalar minimizers: the user's
> objective function returns a single value ("cost") to minimize.  If their
> objective function *really* calculates an array and then the user reduces
> that array to a scalar (say as sum-of-squares or log-likelihood) then they
> are almost certainly better of using `least_squares` or `leastsq` or a
> similar solver.   It also seems to me that a fair number of people use
> `minimize()` when their problem actually is or can be made linear.  That
> is, it might be good to clarify when iterative methods are needed, and when
> regression methods can be used.
>
> The methods in `minimize()` are inherently local solvers.  Global
> optimization is a different category, but a tutorial would probably do well
> to describe the difference but also to be clear-eyed that many problems do
> not require a global solver as much as they need a local solver plus some
> prior knowledge and understanding of the problem to be solved.   That is,
> many real problems are really solved with local solvers.  Many (maybe most)
> problems actually do (and certainly should try to) start with a decent
> understanding of the scope of the problem.  With a local solver,  the task
> for the user is not a blind search, but refining variable values and trying
> to understand their correlations and uncertainties.
>
> That is all to say that you probably should not expect too much experience
> with these topics on the part of the reader of the tutorial.
>
>
>> A lot of methods are available.   Most problems can have BFGS thrown at
>> them, but I want to explain something for those other cases.  Other
>> situations could have features, like constraints or non-differentiability,
>> that lend themselves to a specific method. But the module still has a lot
>> of alternatives.  Are they there for academic purposes?  Are they the best
>> for some problems? How could someone find that out?
>>
>>
> Those are the right questions and some sort of answer themselves:  The
> different methods represent the advance of history with the older and
> simpler ones really reflecting the aversion to storing results in memory
> and on performance (that is trying to find the solution with the least use
> of memory and the fewest evaluations of the objective function).  Even now,
> and in this conversation, the emphasis is on performance (IMHO, that
> emphasis is to the detriment of "user-friendliness" and helping the user
> identify "correctness").
>

Thanks for this perspective Matt! Your whole email is spot on.

Cheers,
Ralf

The fact that you asked about when the user would choose between different
> solvers and different methods for calculating derivatives means that this
> burden, which is really a mathematical detail (if an important one) of the
> method used,  is put on the user of `minimize()`.
>
>
>> For derivatives, users can choose to provide a function or three
>> different types of finite-difference schemes.
>>
>> When is providing a function better than finite-difference derivatives?
>> For Hessians, approximations are sometimes more efficient.  How can we know
>> in advance if that's true? Is that ever true for gradients?
>>
>> How do we choose which finite-difference scheme? `3-point` and `cs` (if
>> it is the symmetric approximation I think) have higher-order accuracy, but
>> `cs` uses a point not yet computed.  Is `3-point` ever not the way to go?
>>
>
> Some of the other responses said that the user should provide functions
> for derivatives.  I respectfully disagree that this should be emphasized in
> a tutorial.  If (it can be a big "if") the user can write a function for
> the derivatives (of cost with respect to each variable), that usually means
> that the objective function is differentiable, and probably "sane".  This
> means in turn that finite-difference methods really ought to work (in the
> sense of "not fail").  Finite-difference Jacobians will certainly mean more
> evaluations of the objective function.  Assuming that the Jacobian function
> has a runtime that is about the same as the objective function, the total
> runtime will almost certainly be less when providing a Jacobian function.
>
> Does reduced runtime mean that analytic Jacobians should always be
> preferred?  I think it does not.
>
> The user of `scipy.optimize.minimize()` should know that run-time is far
> less precious than write- and read-time.  At the very least, a Jacobian
> function should be used only when the time spent *writing* the Jacobian
> function is less than the time saved running the program. That "less than"
> might even be "much less than".  Changing runtime by a factor of 5 from 1
> second to 200 milliseconds is certainly *not* worth writing a Jacobian
> (even if one types much faster and more accurately than average).
>  Changing runtime to 100 hours to 20 hours might be worth spending some
> time on writing (and testing and maintaining!) the extra code of the
> Jacobian function. If you expect the program you are writing will be used
> 10,000 times, then sure, investigate a Jacobian function.  Or figure out
> how to make the objective function faster. Or buy more compute power.
>
> I would hope a tutorial would emphasize other aspects of minimization over
> the technical detail and performance boost that might be found by providing
> an analytic Jacobian.  These other aspects might include:
>    how to compare solver methods and types of cost function
> (least-squares, log-probability, etc).
>    how to investigate whether the solution is only local" or is stable and
> global.
>    how starting values can affect the solution and run time.
>    how one can evaluate the quality of the solution ("goodness of fit")
> and estimate the uncertainties in the values found for the variables.
>
> There is the open-ended but critical question of "is the objective
> function correct", or for curve-fitting "is this mathematical model
> correct"?  Related to this is the question of whether the set of variables
> used is actually complete or if there are some latent variables that the
> objective function assumes but that should actually be tested?  These are
> challenging topics, but also strike me as being of more importance and
> interest to the user of `scipy.optimize` than ultimate performance.   That
> is, analytic Jacobians strike me as "advanced usage" and would be a fine
> next step after a tutorial is digested.
>
> Sorry that was so long, and feel free to use or ignore any of it as
> you see fit.
> Cheers,
>
> --Matt Newville
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20191008/2340db55/attachment-0001.html>