[SciPy-dev] adding mpfit to scipy optimize (Please respond)
Pauli Virtanen
pav at iki.fi
Sat May 9 11:55:49 EDT 2009
Fri, 08 May 2009 16:09:35 -0400, william ratcliff wrote:
> Hi! For a long time, there has been a standard package used by IDL
> users for fitting functions to data called MPFIT:
> http://cow.physics.wisc.edu/~craigm/idl/fitting.html
> <http://cow.physics.wisc.edu/%7Ecraigm/idl/fitting.html>
[clip]
> http://drop.io/mpfitdrop
Nice!
However, I see some points where the code could be improved for better
reusability and maintainability.
If we lived in an ideal world with infinite time to polish everything,
I'd like to see all of the points below addressed before landing this to
Scipy. But since this would be lots of error-prone work, it's probably
best to try to reach some compromise.
Given these constraints, I'd still like to see at least the coding style
and error handling fixed (which probably are not too difficult to
change), in addition to having better test coverage. The rest could come
later, even if we accrue yet more technical debt with this...
First, broad maintenance concerns:
- We already have `leastsq` from MINPACK. Having two MINPACK-derived
least squares fitting routines is not good.
So, I'd perhaps like to see the `leastsq` core part extracted out of
MPFIT, and the MPFIT interface implemented on top of it as a thin
wrapper, or the other way around.
Maybe, if the modifications made on MINPACK are small, they can be
backported to the Fortran code and MPFIT can be reimplemented on top
of `leastsq`.
Any thoughts on this?
- What is the performance of the Python implementation as compared to the
Fortran code? For small data sets, the Python code is probably much
slower, but for large datasets is the performance is comparable?
- Much of the code is Fortran written in Python: long routines,
goto-equivalents, 6-letter variable names.
Good commenting offset this, though.
Then more specific points of refactoring:
- The code implements QR factorization with column pivoting from scratch.
Perhaps some of this could be replaced with suitable LAPACK routines,
or with stuff from scipy.linalg. (Cf. DGEQPF, DGEQP3)
I'm not sure whether there's something suitable for qrsolve in LAPACK;
the triangular system solver could be replaced with DTRTRS.
Anyway, it might be useful to refactor qrfac and qrsolve out of MPFIT;
there may be other applications when it's useful to be able to solve
||(A + I lambda) x - b||_2 = min! efficiently for multiple different
`lambda` in sequence.
- fdjac2 should probably be refactored out of MPFIT; other optimization
algorithms that need Jacobians computed by forward differences can then
use it. Do we already have something similar in Scipy already?
- `enorm` should be replaced with BLAS xNRM2; it does appropriate scaling
automatically and is probably much faster.
enorm = scipy.lib.blas.get_blas_funcs(['nrm2'], [some_array])
- The long mpfit.__init__ routine should be split into smaller parts,
the function body is too long. I'm not sure exactly what parts, though.
Perhaps at least the covariance matrix computation and input argument
parsing should be separated.
- "self.errmsg = 'ERROR ...; return'".
Probably should raise exceptions instead, at least for errors in input
parameters.
In general, I think the error handling should make better use of the
Python exception and warning system; using `print` is not a correct
way to signal an error in library code.
- I'm not sure about implementing everything in a class. This tends to
couple tightly parts of the code that wouldn't really need such a strong
coupling.
- Does it work with complex-valued inputs?
Numpy issues:
- Use numpy.finfo instead of machar
- Lots of nonzero/put/take combos, probably dates from Numeric.
I think these should be replaced with boolean array indexing, to
enhance readability and performance.
- numpy.min/max -> amin/amax
Minor stylistic issues in the code:
- `catch_msg` is not used for anything
- The original fortran documentation could be moved into the method
docstrings.
- "if foo:", not "if (foo):"
- if foo:
bar
not
if foo: bar
- "return foo", not "return(foo)"
- "# comment", not "## comment"
- It's not necessary to "import types"; you can just use
isinstance(x, float)
[clip]
> I have written a nose-test compatible test. Could someone look at it
> and tell me if it meets the scipy style before I continue adding more
> tests from Craig's test-suite for the C version of his program?
The test style is OK; if it works with nosetests, it should be OK.
Suggest using "import numpy as np", though, to stay in line with the rest
of the code.
--
Pauli Virtanen
More information about the SciPy-Dev
mailing list