[SciPy-Dev] Minimizer in scipy.optimize

Sun Mar 18 22:36:33 EDT 2012

Hi Martin, All,

On March 16, 2012 18:33, Martin Teichmann wrote:
> Hello list,
>
> I had been working on a mostly-python implementation of the
> Levenberg-Marquardt algorithm for data fitting, which I put here:
>
> https://github.com/scipy/scipy/pull/90
>
> one of my main goals was to make it more flexible and usable
> than the FORTRAN version we have in scipy right now. So
> I took an object-oriented approach, where you inherit from a
> fitter class and reimplement the function to fit. Some convenience
> functions around makes this approach very simple, the
> most simple version is using a deocrator, say you want
> to fit your data to a gaussian, you would write:
>
> @fitfunction(width=3, height=2, position=4)
> def gaussian(x, width, height, position):
>   # some code to calculate gaussian here
>
> gaussian.fit(xdata, ydata, width=2, height=1)
>
> that's it! I would like to have some comments about it.

Sorry to barge into this thread if unwelcome -- please view these as
(attempted) constructive criticism from afar.

I think this is an interesting design, but not without some issues.
Principally, what are "xdata" and "ydata" doing?  Presumably, you're
assigning 'xdata' to 'x' in gaussian(), but this is either slightly
simplistic or opaque....  And I would guess that "ydata" is the data
to compare / fit to the gaussian.....

As Denis said, the point is to minimize a multi-variate function, not
fit data to a simple model.     In fact, the ordinate "x"/"xdata"
shouldn't be passed in as a primary array -- it's extra data to help
calculate the model.

It's also difficult to tell what the fitfunction arguments are
doing... setting default values?   So that, by leaving 'position'
unspecified in gaussian.fit(), is position fixed at 4?  Or is it the
other way round -- position is fit, width and height are fixed?

Keyword params for variable names seems clever, and may be workable,
but the objective function needs to be able to have other data passed
in as well (such as you have passed in xdata and ydata), and this
cannot be limited to  "ordinate value" and "data to subtract from
model" -- far too restrictive.

Using keyword parameters for variable names instead of a list of
variables as the first argument of the objective function does seem
interesting.

> While working on it, I have been pointed to two different
> related efforts, the first being here: http://newville.github.com/lmfit-py/
> Matthew Newville wrote this trying to avoid clumsy unreadable
> fitting routines like that:
>
> def gaussian(x, p):
>  return p[0] * exp(-((x - p[1]) / p[2]) / 2)
>
> He's right that that's ugly, unfortunately, I think his solution
> is not much better, this is why I didnt take his route.

I think hat's not quite a correct characterization of the motivations
for lmfit.  It is not because I think using a list/array of variables
in the first argument is "ugly".  lmfit intentionally uses a call
signature for the objective function that is similar to
scipy.optimize.leastsq().  But lmfit  abstracts numerical variables to
a Parameter object that has bounds, a flag to set whether its fixed or
not, or an expression used to evaluate it in terms of the other
Parameters.  So, it might be just as ugly as scipy.optimize.leastsq(),
but it's trying to solve a problem that your solution doesn't seem to
address.

I'm not sure I see the benefit of rewriting MINPACK with cython, but
that seems like a separate issue than design of the objective
function.

Anyway, I'm intrigued by using keyword params to identify Parameters,
but I think there might be some details to work out.

Cheers,

--Matt Newville