[SciPy-dev] adding mpfit to scipy optimize (Please respond)

Sat May 9 11:55:49 EDT 2009

Fri, 08 May 2009 16:09:35 -0400, william ratcliff wrote:

> Hi!  For a long time, there has been a standard package used by IDL
> users for fitting functions to data called MPFIT:
> http://cow.physics.wisc.edu/~craigm/idl/fitting.html
> <http://cow.physics.wisc.edu/%7Ecraigm/idl/fitting.html>
[clip]
> http://drop.io/mpfitdrop

Nice!

However, I see some points where the code could be improved for better 
reusability and maintainability.

If we lived in an ideal world with infinite time to polish everything, 
I'd like to see all of the points below addressed before landing this to 
Scipy. But since this would be lots of error-prone work, it's probably 
best to try to reach some compromise.

Given these constraints, I'd still like to see at least the coding style 
and error handling fixed (which probably are not too difficult to 
change), in addition to having better test coverage. The rest could come 
later, even if we accrue yet more technical debt with this...

First, broad maintenance concerns:

- We already have `leastsq` from MINPACK. Having two MINPACK-derived
  least squares fitting routines is not good.

  So, I'd perhaps like to see the `leastsq` core part extracted out of
  MPFIT, and the MPFIT interface implemented on top of it as a thin
  wrapper, or the other way around.

  Maybe, if the modifications made on MINPACK are small, they can be
  backported to the Fortran code and MPFIT can be reimplemented on top
  of `leastsq`.

  Any thoughts on this?

- What is the performance of the Python implementation as compared to the
  Fortran code? For small data sets, the Python code is probably much
  slower, but for large datasets is the performance is comparable?

- Much of the code is Fortran written in Python: long routines, 
  goto-equivalents, 6-letter variable names.

  Good commenting offset this, though.

Then more specific points of refactoring:

- The code implements QR factorization with column pivoting from scratch.

  Perhaps some of this could be replaced with suitable LAPACK routines,
  or with stuff from scipy.linalg. (Cf. DGEQPF, DGEQP3)

  I'm not sure whether there's something suitable for qrsolve in LAPACK;
  the triangular system solver could be replaced with DTRTRS.

  Anyway, it might be useful to refactor qrfac and qrsolve out of MPFIT;
  there may be other applications when it's useful to be able to solve
  ||(A + I lambda) x - b||_2 = min! efficiently for multiple different
  `lambda` in sequence.

- fdjac2 should probably be refactored out of MPFIT; other optimization 
  algorithms that need Jacobians computed by forward differences can then
  use it. Do we already have something similar in Scipy already?

- `enorm` should be replaced with BLAS xNRM2; it does appropriate scaling
  automatically and is probably much faster.

  enorm = scipy.lib.blas.get_blas_funcs(['nrm2'], [some_array])

- The long mpfit.__init__ routine should be split into smaller parts,
  the function body is too long. I'm not sure exactly what parts, though.

  Perhaps at least the covariance matrix computation and input argument
  parsing should be separated.

- "self.errmsg = 'ERROR ...; return'".
  Probably should raise exceptions instead, at least for errors in input
  parameters.

  In general, I think the error handling should make better use of the
  Python exception and warning system; using `print` is not a correct
  way to signal an error in library code.

- I'm not sure about implementing everything in a class. This tends to
  couple tightly parts of the code that wouldn't really need such a strong
  coupling.

- Does it work with complex-valued inputs?

Numpy issues:

- Use numpy.finfo instead of machar

- Lots of nonzero/put/take combos, probably dates from Numeric.

  I think these should be replaced with boolean array indexing, to
  enhance readability and performance.

- numpy.min/max -> amin/amax

Minor stylistic issues in the code:

- `catch_msg` is not used for anything

- The original fortran documentation could be moved into the method
  docstrings.

- "if foo:", not "if (foo):"

- if foo:
     bar

  not

  if foo: bar

- "return foo", not "return(foo)"

- "# comment", not "## comment"

- It's not necessary to "import types"; you can just use 
  isinstance(x, float)

[clip]
> I have written a nose-test compatible test.  Could someone look at it
> and tell me if it meets the scipy style before I continue adding more
> tests from Craig's test-suite for the C version of his program?

The test style is OK; if it works with nosetests, it should be OK. 
Suggest using "import numpy as np", though, to stay in line with the rest 
of the code.

-- 
Pauli Virtanen