[SciPy-dev] Mpfit optimizer

Wed Aug 19 19:09:41 EDT 2009

Hi,

I received an old python version of the mpfit optimizer earlier this summer.
My advisor had previously been in contact with SciPy about including this
optimizer in the optimize package. He was sent to following message with
tips to help improve the package:

"""

> ---------- Forwarded message ----------

From: Pauli Virtanen <pav at iki.fi>

Date: Sat, May 9, 2009 at 11:55 AM

Subject: Re: [SciPy-dev] adding mpfit to scipy optimize (Please respond)

To: scipy-dev at scipy.org

>
> Fri, 08 May 2009 16:09:35 -0400, william ratcliff wrote:

> > Hi!  For a long time, there has been a standard package used by IDL

> users for fitting functions to data called MPFIT:

> http://cow.physics.wisc.edu/~craigm/idl/fitting.html

> <http://cow.physics.wisc.edu/%7Ecraigm/idl/fitting.html>

[clip]

> http://drop.io/mpfitdrop

> Nice!

> However, I see some points where the code could be improved for better

reusability and maintainability.

> If we lived in an ideal world with infinite time to polish everything,

I'd like to see all of the points below addressed before landing this to

Scipy. But since this would be lots of error-prone work, it's probably

best to try to reach some compromise.

> Given these constraints, I'd still like to see at least the coding style

and error handling fixed (which probably are not too difficult to

change), in addition to having better test coverage. The rest could come

later, even if we accrue yet more technical debt with this...

>
> First, broad maintenance concerns:

> - We already have `leastsq` from MINPACK. Having two MINPACK-derived

 least squares fitting routines is not good.

>  So, I'd perhaps like to see the `leastsq` core part extracted out of

 MPFIT, and the MPFIT interface implemented on top of it as a thin

 wrapper, or the other way around.

>  Maybe, if the modifications made on MINPACK are small, they can be

 backported to the Fortran code and MPFIT can be reimplemented on top

 of `leastsq`.

>  Any thoughts on this?

> - What is the performance of the Python implementation as compared to the

 Fortran code? For small data sets, the Python code is probably much

 slower, but for large datasets is the performance is comparable?

> - Much of the code is Fortran written in Python: long routines,

 goto-equivalents, 6-letter variable names.

>  Good commenting offset this, though.

>
> Then more specific points of refactoring:

> - The code implements QR factorization with column pivoting from scratch.

>  Perhaps some of this could be replaced with suitable LAPACK routines,

 or with stuff from scipy.linalg. (Cf. DGEQPF, DGEQP3)

>  I'm not sure whether there's something suitable for qrsolve in LAPACK;

 the triangular system solver could be replaced with DTRTRS.

>  Anyway, it might be useful to refactor qrfac and qrsolve out of MPFIT;

 there may be other applications when it's useful to be able to solve

 ||(A + I lambda) x - b||_2 = min! efficiently for multiple different

 `lambda` in sequence.

> - fdjac2 should probably be refactored out of MPFIT; other optimization

 algorithms that need Jacobians computed by forward differences can then

 use it. Do we already have something similar in Scipy already?

> - `enorm` should be replaced with BLAS xNRM2; it does appropriate scaling

 automatically and is probably much faster.

>  enorm = scipy.lib.blas.get_blas_funcs(['nrm2'], [some_array])

> - The long mpfit.__init__ routine should be split into smaller parts,

 the function body is too long. I'm not sure exactly what parts, though.

>  Perhaps at least the covariance matrix computation and input argument

 parsing should be separated.

> - "self.errmsg = 'ERROR ...; return'".

 Probably should raise exceptions instead, at least for errors in input

 parameters.

>  In general, I think the error handling should make better use of the

 Python exception and warning system; using `print` is not a correct

 way to signal an error in library code.

> - I'm not sure about implementing everything in a class. This tends to

 couple tightly parts of the code that wouldn't really need such a strong

 coupling.

> - Does it work with complex-valued inputs?

>
> Numpy issues:

> - Use numpy.finfo instead of machar

> - Lots of nonzero/put/take combos, probably dates from Numeric.

>  I think these should be replaced with boolean array indexing, to

 enhance readability and performance.

> - numpy.min/max -> amin/amax

>
> Minor stylistic issues in the code:

> - `catch_msg` is not used for anything

> - The original fortran documentation could be moved into the method

 docstrings.

> - "if foo:", not "if (foo):"

> - if foo:

    bar

>  not

>  if foo: bar

> - "return foo", not "return(foo)"

> - "# comment", not "## comment"

> - It's not necessary to "import types"; you can just use

 isinstance(x, float)

>
> [clip]

> I have written a nose-test compatible test.  Could someone look at it

> and tell me if it meets the scipy style before I continue adding more

> tests from Craig's test-suite for the C version of his program?

> The test style is OK; if it works with nosetests, it should be OK.

Suggest using "import numpy as np", though, to stay in line with the rest

of the code.

> --

Pauli Virtanen

> _______________________________________________

Scipy-dev mailing list

Scipy-dev at scipy.org

http://mail.scipy.org/mailman/listinfo/scipy-dev

"""

We cleaned up most of these issues and would like SciPy to reconsider the
package. Here is a short list of things we feel we accomplished:

- Minor stylic issues cleaned up
- Numpy issues (still a lot of nonzero calls but no more put/take combos)
- There is an approx_jacobian that uses a forward diff method in
scipy.optimize.fmin_slsq. We were not sure whether we should still factor
out our jacobian approximation. We also just checked for jacobian
approximators in SciPy - we haven't checked to see which is better, cleaner,
faster, etc.
- enorm fixed
- long init routine split in half
- Exceptions included (although a lot of them to show specific errors - this
can be changed)

Things that still need to be fixed:

- Complex numbers. We didn't have this in mind in during implementation and
still haven't thoroughly tested.
- Still written as a class. What is the best alternative? Just a simple
function?
- QR from scratch.
- Performance testing vs. Fortran code.

Please let us know if there is anymore work you would like done on this
optimizer. We use it for our own applications and it would be nice to have
it included in SciPy for others to use. Thanks for your time.

Bill Flynn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20090819/356d0df2/attachment.html>