[SciPy-Dev] curve_fit() should require initial values for parameters

Tue Jan 29 11:53:37 EST 2019

> The problem I have with this is that there really is not an option to "try
automatic first".  There is "try `np.ones(n_variables)` first".   This, or
any other value, is really not a defensible choice for starting
values.  Starting
values always depend on the function used and the data being fit.

Why not? 1.s are as good as any other choice. I don't know anything about
the curve fit I will get in the end. So I don't need to pretend that I know
a good starting value. Maybe for 3 parameter functions, fine I can come up
with an argument but you surely don't expect me to know the starting point
if I am fitting a 7 parameter func involving esoteric structure. At that
point I am completely ignorant about anything about this function. So not
knowing where to start is not due to my noviceness about the tools but
because by definition. My search might even turn out to be convex so
initial value won't matter.

> Currently `curve_fit`  converts `p0=None` to `np.ones(n_variables)`
without warning or explanation.  Again, I do not use `curve_fit()` myself.
I find several aspects of it unpleasant.

It is documented in the p0 argument docs. I am using this function quite
often. That's why I don't like extra required arguments. It's annoying to
enter some random array just to please the API where I know that I am just
taking a shot in the dark. I am pretty confident that if we force this
argument most of the people you want to educate will enter np.zeros(n).
Then they will get an even weirder error then they'll try np.ones(n) but
misremember n then they get another error to remember the func parameter
number which has already trippped up twice. This curve_fit function is one
of those functions that you don't run just once and be done with it but
over and over again until you give up or satisfied. Hence defaults matter a
lot from a UX perspective. "If you have an initial value in mind fine enter
it otherwise let me do my thing" is much better than "I don't care about
your quick experiment give me some values or I will keep tripping up".

> But this behavior strikes me as utterly wrong and a disservice to the
scipy ecosystem.   I do not think that a documentation change is
sufficient.

Maybe a bit overzealous?

On Mon, Jan 28, 2019 at 3:07 AM Matt Newville <newville at cars.uchicago.edu>
wrote:

> Hi All,
>
> On Thu, Jan 24, 2019 at 1:20 PM <josef.pktd at gmail.com> wrote:
>
>>
>>
>> On Thu, Jan 24, 2019 at 1:46 PM Stefan van der Walt <stefanv at berkeley.edu>
>> wrote:
>>
>>> Hi Josef,
>>>
>>> On Thu, 24 Jan 2019 11:26:09 -0500, josef.pktd at gmail.com wrote:
>>> > I think making initial values compulsory is too much of a break with
>>> > tradition.
>>> > IMO, a warning and better documentation would be more appropriate.
>>> >
>>> https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
>>> > does not show an example with starting values.
>>> > curve_fit could issue a warning if p0 is not specified, or warn if
>>> > convergence fails and p0 was not specified.
>>>
>>> Isn't the greater danger that convergence succeeds, with p0 unspecified,
>>> and the resulting model not being at all what the user had in mind?
>>>
>>
>> Unless the optimization problem is globally convex, the user always needs
>> to check the results.
>>
>>
>>
>>>
>>> > I think it should also be possible to improve the default starting
>>> values,
>>> > e.g. if the function fails or if bounds are provided.
>>>
>>> This is the type of magic I hope we can avoid.  Having different
>>> execution paths based on some vaguely defined notion of perceived
>>> failure seems dangerous at best.
>>>
>>
>> I there is no guarantee for a global optimum, it's still what either the
>> program or the user has to do.
>>
>> E.g. for statsmodels (very rough guess on numbers)
>> 90% of the cases work fine
>> 10% of the cases the data is not appropriate, singular, ill conditioned
>> or otherwise "not nice"
>> 10% of the cases the optimizer has problems and does not converge.
>>
>> In this last case either the program or the user needs to work more:
>> We can try different optimizers, e.g. start with nelder-mead before
>> switching to a gradient optimizer.
>> Or, switch to global optimizer from scipy, if the underlying model is
>> complex and might not be well behaved.
>> or pure man's global optimizer: try out many different random or
>> semi-random starting values.
>> (and if all fails go back to the drawing board and try to find a
>> parameterization that is better behaved.)
>>
>> statsmodels is switching optimizers in some cases, but in most cases it
>> is up to the user to change the optimizers after convergence failure.
>> However, we did select default optimizers by which scipy optimizer seems
>> to work well for the various cases.
>> Stata is also switching optimizers in some cases, and AFAIR has in some
>> cases and option to "try harder".
>> statsmodels is still missing an automatic "try harder" option, that
>> automatically switches optimizers on convergence failure.
>>
>>
>>
>>>
>>> > I'm not a user of curve_fit, but I guess there might be a strong
>>> selection
>>> > bias in use cases when helping out users that run into problems.
>>>
>>> I agree; and I think this can be accomplished by better documentation,
>>> helpful warnings, and assisting the user in choosing correct parameters.
>>>
>>
>> The main question for me is whether the warnings and improved
>> documentation are enough, or whether curve_fit needs to force every user to
>> specify the starting values.
>>
>
> I may not be understanding what you say about statsmodel.  Is that using
> or related to `curve_fit()`?  Perhaps it works well in many cases for
> you because of the limited range of the probability distribution functions
> being fitted?
>
> My view on this starts with the fact that Initial values are actually
> required in non-linear optimization.  In a sense, not "forcing every user
> to specify starting values" and silently replacing `None` with
> `np.ones(n_variables)`  is misinforming the user.  I cannot think of any
> reason to recommend this behavior.  It will certainly fail spectacularly
> sometimes.  I would not try to guess (or probably believe anyone else's
> guess ;))  how often this would happen, but I can tell you that for
> essentially all of the fitting I do and my applications do for other users,
> giving initial values of 1 for all parameters would fail in such a way as
> to not move past the initial values (that is "not work" in a way that might
> easily confuse a novice).   Again, I do not use `curve_fit()`, but clearly
> `p0=None` fails often enough to cause confusion.
>
>
>  i.e. I think
>> Try automatic first, and if that does not succeed, then the user has to
>> think again,
>> is more convenient, than
>> "you have to think about your problem first, don't just hit the button".
>>
>>
> The problem I have with this is that there really is not an option to "try
> automatic first".  There is "try `np.ones(n_variables)` first".   This,
> or any other value, is really not a defensible choice for starting values.
>  Starting values always depend on the function used and the data being
> fit.
>
> The user of `curve_fit` already has to provide data (about which they
> presumably know something) and write a function that models that data.  I
> think that qualifies as "has to think about their problem".  They should be
> able to make some guess ("prior belief") of the parameter values.
> Hopefully they will run their modelling function with some sensible values
> for the parameters before running `curve_fit` to make sure that their
> function runs correctly.
>
> Currently `curve_fit`  converts `p0=None` to `np.ones(n_variables)`
> without warning or explanation.  Again, I do not use `curve_fit()` myself.
> I find several aspects of it unpleasant.  But this behavior strikes me as
> utterly wrong and a disservice to the scipy ecosystem.   I do not think
> that a documentation change is sufficient.   I can believe a deprecation
> time would be reasonable,  but I would hope this behavior could be removed.
>
> --Matt Newville
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20190129/19e6459f/attachment-0001.html>