[SciPy-Dev] New Tutorial on Optimize Help
Ralf Gommers
ralf.gommers at gmail.com
Tue Oct 8 09:03:39 EDT 2019
On Mon, Oct 7, 2019 at 6:22 AM Matt Newville <newville at cars.uchicago.edu>
wrote:
> Hi Christina,
>
> On Mon, Sep 30, 2019 at 6:43 PM Christina Lee <chrissie.c.l at gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm a SciPy technical writer and am currently rewriting the
>> scipy.optimize tutorial, focusing on `minimize` right now. While I've
>> gotten a grasp of the "how", I still want to explain "why". Why choose
>> one option over another? I could use information from those with more
>> experience.
>>
>
> Thank you very much for taking this on -- this is heroic work. I would
> make no claims about what "most users" use these methods for, and doubt
> anyone else really knows this either. In my experience (maintainer of the
> lmfit library), it seems that many people using `scipy.optimize` are
> actually trying to solve curve-fitting problems.
>
> Focusing on the group of methods wrapped by `minimize()` is a fine
> starting point, but you may also want to consider explaining why the user
> would choose `minimize()` over other approaches. That is, describe what
> sort of problems `minimize()` is most suitable for, which it can be useful
> for but might not be the only approach, and for which it is not the most
> suitable.
>
> The methods of `minimize()` are all scalar minimizers: the user's
> objective function returns a single value ("cost") to minimize. If their
> objective function *really* calculates an array and then the user reduces
> that array to a scalar (say as sum-of-squares or log-likelihood) then they
> are almost certainly better of using `least_squares` or `leastsq` or a
> similar solver. It also seems to me that a fair number of people use
> `minimize()` when their problem actually is or can be made linear. That
> is, it might be good to clarify when iterative methods are needed, and when
> regression methods can be used.
>
> The methods in `minimize()` are inherently local solvers. Global
> optimization is a different category, but a tutorial would probably do well
> to describe the difference but also to be clear-eyed that many problems do
> not require a global solver as much as they need a local solver plus some
> prior knowledge and understanding of the problem to be solved. That is,
> many real problems are really solved with local solvers. Many (maybe most)
> problems actually do (and certainly should try to) start with a decent
> understanding of the scope of the problem. With a local solver, the task
> for the user is not a blind search, but refining variable values and trying
> to understand their correlations and uncertainties.
>
> That is all to say that you probably should not expect too much experience
> with these topics on the part of the reader of the tutorial.
>
>
>> A lot of methods are available. Most problems can have BFGS thrown at
>> them, but I want to explain something for those other cases. Other
>> situations could have features, like constraints or non-differentiability,
>> that lend themselves to a specific method. But the module still has a lot
>> of alternatives. Are they there for academic purposes? Are they the best
>> for some problems? How could someone find that out?
>>
>>
> Those are the right questions and some sort of answer themselves: The
> different methods represent the advance of history with the older and
> simpler ones really reflecting the aversion to storing results in memory
> and on performance (that is trying to find the solution with the least use
> of memory and the fewest evaluations of the objective function). Even now,
> and in this conversation, the emphasis is on performance (IMHO, that
> emphasis is to the detriment of "user-friendliness" and helping the user
> identify "correctness").
>
Thanks for this perspective Matt! Your whole email is spot on.
Cheers,
Ralf
The fact that you asked about when the user would choose between different
> solvers and different methods for calculating derivatives means that this
> burden, which is really a mathematical detail (if an important one) of the
> method used, is put on the user of `minimize()`.
>
>
>> For derivatives, users can choose to provide a function or three
>> different types of finite-difference schemes.
>>
>> When is providing a function better than finite-difference derivatives?
>> For Hessians, approximations are sometimes more efficient. How can we know
>> in advance if that's true? Is that ever true for gradients?
>>
>> How do we choose which finite-difference scheme? `3-point` and `cs` (if
>> it is the symmetric approximation I think) have higher-order accuracy, but
>> `cs` uses a point not yet computed. Is `3-point` ever not the way to go?
>>
>
> Some of the other responses said that the user should provide functions
> for derivatives. I respectfully disagree that this should be emphasized in
> a tutorial. If (it can be a big "if") the user can write a function for
> the derivatives (of cost with respect to each variable), that usually means
> that the objective function is differentiable, and probably "sane". This
> means in turn that finite-difference methods really ought to work (in the
> sense of "not fail"). Finite-difference Jacobians will certainly mean more
> evaluations of the objective function. Assuming that the Jacobian function
> has a runtime that is about the same as the objective function, the total
> runtime will almost certainly be less when providing a Jacobian function.
>
> Does reduced runtime mean that analytic Jacobians should always be
> preferred? I think it does not.
>
> The user of `scipy.optimize.minimize()` should know that run-time is far
> less precious than write- and read-time. At the very least, a Jacobian
> function should be used only when the time spent *writing* the Jacobian
> function is less than the time saved running the program. That "less than"
> might even be "much less than". Changing runtime by a factor of 5 from 1
> second to 200 milliseconds is certainly *not* worth writing a Jacobian
> (even if one types much faster and more accurately than average).
> Changing runtime to 100 hours to 20 hours might be worth spending some
> time on writing (and testing and maintaining!) the extra code of the
> Jacobian function. If you expect the program you are writing will be used
> 10,000 times, then sure, investigate a Jacobian function. Or figure out
> how to make the objective function faster. Or buy more compute power.
>
> I would hope a tutorial would emphasize other aspects of minimization over
> the technical detail and performance boost that might be found by providing
> an analytic Jacobian. These other aspects might include:
> how to compare solver methods and types of cost function
> (least-squares, log-probability, etc).
> how to investigate whether the solution is only local" or is stable and
> global.
> how starting values can affect the solution and run time.
> how one can evaluate the quality of the solution ("goodness of fit")
> and estimate the uncertainties in the values found for the variables.
>
> There is the open-ended but critical question of "is the objective
> function correct", or for curve-fitting "is this mathematical model
> correct"? Related to this is the question of whether the set of variables
> used is actually complete or if there are some latent variables that the
> objective function assumes but that should actually be tested? These are
> challenging topics, but also strike me as being of more importance and
> interest to the user of `scipy.optimize` than ultimate performance. That
> is, analytic Jacobians strike me as "advanced usage" and would be a fine
> next step after a tutorial is digested.
>
> Sorry that was so long, and feel free to use or ignore any of it as
> you see fit.
> Cheers,
>
> --Matt Newville
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20191008/2340db55/attachment-0001.html>
More information about the SciPy-Dev
mailing list