[SciPy-Dev] curve_fit() should require initial values for parameters

Thu Jan 24 14:19:09 EST 2019

On Thu, Jan 24, 2019 at 1:46 PM Stefan van der Walt <stefanv at berkeley.edu>
wrote:

> Hi Josef,
>
> On Thu, 24 Jan 2019 11:26:09 -0500, josef.pktd at gmail.com wrote:
> > I think making initial values compulsory is too much of a break with
> > tradition.
> > IMO, a warning and better documentation would be more appropriate.
> >
> https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
> > does not show an example with starting values.
> > curve_fit could issue a warning if p0 is not specified, or warn if
> > convergence fails and p0 was not specified.
>
> Isn't the greater danger that convergence succeeds, with p0 unspecified,
> and the resulting model not being at all what the user had in mind?
>

Unless the optimization problem is globally convex, the user always needs
to check the results.

>
> > I think it should also be possible to improve the default starting
> values,
> > e.g. if the function fails or if bounds are provided.
>
> This is the type of magic I hope we can avoid.  Having different
> execution paths based on some vaguely defined notion of perceived
> failure seems dangerous at best.
>

I there is no guarantee for a global optimum, it's still what either the
program or the user has to do.

E.g. for statsmodels (very rough guess on numbers)
90% of the cases work fine
10% of the cases the data is not appropriate, singular, ill conditioned or
otherwise "not nice"
10% of the cases the optimizer has problems and does not converge.

In this last case either the program or the user needs to work more:
We can try different optimizers, e.g. start with nelder-mead before
switching to a gradient optimizer.
Or, switch to global optimizer from scipy, if the underlying model is
complex and might not be well behaved.
or pure man's global optimizer: try out many different random or
semi-random starting values.
(and if all fails go back to the drawing board and try to find a
parameterization that is better behaved.)

statsmodels is switching optimizers in some cases, but in most cases it is
up to the user to change the optimizers after convergence failure.
However, we did select default optimizers by which scipy optimizer seems to
work well for the various cases.
Stata is also switching optimizers in some cases, and AFAIR has in some
cases and option to "try harder".
statsmodels is still missing an automatic "try harder" option, that
automatically switches optimizers on convergence failure.

>
> > I'm not a user of curve_fit, but I guess there might be a strong
> selection
> > bias in use cases when helping out users that run into problems.
>
> I agree; and I think this can be accomplished by better documentation,
> helpful warnings, and assisting the user in choosing correct parameters.
>

The main question for me is whether the warnings and improved documentation
are enough, or whether curve_fit needs to force every user to specify the
starting values.

i.e. I think
Try automatic first, and if that does not succeed, then the user has to
think again,
is more convenient, than
"you have to think about your problem first, don't just hit the button".

Josef

>
> Best regards,
> Stéfan
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20190124/7f2f0cb1/attachment-0001.html>