[SciPy-dev] adding mpfit to scipy optimize (Please respond)

Sun May 10 01:21:32 EDT 2009

Thanks for the feedback!   Next week is rather busy for me, but I will try
to improve the exception handling and look into calling portions of minpack
directly (ex qr factorization) --but are they currently exposed to python?

Thanks,
William

On Sat, May 9, 2009 at 11:55 AM, Pauli Virtanen <pav at iki.fi> wrote:

> Fri, 08 May 2009 16:09:35 -0400, william ratcliff wrote:
>
> > Hi!  For a long time, there has been a standard package used by IDL
> > users for fitting functions to data called MPFIT:
> > http://cow.physics.wisc.edu/~craigm/idl/fitting.html<http://cow.physics.wisc.edu/%7Ecraigm/idl/fitting.html>
> > <http://cow.physics.wisc.edu/%7Ecraigm/idl/fitting.html>
> [clip]
> > http://drop.io/mpfitdrop
>
> Nice!
>
> However, I see some points where the code could be improved for better
> reusability and maintainability.
>
> If we lived in an ideal world with infinite time to polish everything,
> I'd like to see all of the points below addressed before landing this to
> Scipy. But since this would be lots of error-prone work, it's probably
> best to try to reach some compromise.
>
> Given these constraints, I'd still like to see at least the coding style
> and error handling fixed (which probably are not too difficult to
> change), in addition to having better test coverage. The rest could come
> later, even if we accrue yet more technical debt with this...
>
>
> First, broad maintenance concerns:
>
> - We already have `leastsq` from MINPACK. Having two MINPACK-derived
>  least squares fitting routines is not good.
>
>  So, I'd perhaps like to see the `leastsq` core part extracted out of
>  MPFIT, and the MPFIT interface implemented on top of it as a thin
>  wrapper, or the other way around.
>
>  Maybe, if the modifications made on MINPACK are small, they can be
>  backported to the Fortran code and MPFIT can be reimplemented on top
>  of `leastsq`.
>
>  Any thoughts on this?
>
> - What is the performance of the Python implementation as compared to the
>  Fortran code? For small data sets, the Python code is probably much
>  slower, but for large datasets is the performance is comparable?
>
> - Much of the code is Fortran written in Python: long routines,
>  goto-equivalents, 6-letter variable names.
>
>  Good commenting offset this, though.
>
>
> Then more specific points of refactoring:
>
> - The code implements QR factorization with column pivoting from scratch.
>
>  Perhaps some of this could be replaced with suitable LAPACK routines,
>  or with stuff from scipy.linalg. (Cf. DGEQPF, DGEQP3)
>
>  I'm not sure whether there's something suitable for qrsolve in LAPACK;
>  the triangular system solver could be replaced with DTRTRS.
>
>  Anyway, it might be useful to refactor qrfac and qrsolve out of MPFIT;
>  there may be other applications when it's useful to be able to solve
>  ||(A + I lambda) x - b||_2 = min! efficiently for multiple different
>  `lambda` in sequence.
>
> - fdjac2 should probably be refactored out of MPFIT; other optimization
>  algorithms that need Jacobians computed by forward differences can then
>  use it. Do we already have something similar in Scipy already?
>
> - `enorm` should be replaced with BLAS xNRM2; it does appropriate scaling
>  automatically and is probably much faster.
>
>  enorm = scipy.lib.blas.get_blas_funcs(['nrm2'], [some_array])
>
> - The long mpfit.__init__ routine should be split into smaller parts,
>  the function body is too long. I'm not sure exactly what parts, though.
>
>  Perhaps at least the covariance matrix computation and input argument
>  parsing should be separated.
>
> - "self.errmsg = 'ERROR ...; return'".
>  Probably should raise exceptions instead, at least for errors in input
>  parameters.
>
>  In general, I think the error handling should make better use of the
>  Python exception and warning system; using `print` is not a correct
>  way to signal an error in library code.
>
> - I'm not sure about implementing everything in a class. This tends to
>  couple tightly parts of the code that wouldn't really need such a strong
>  coupling.
>
> - Does it work with complex-valued inputs?
>
>
> Numpy issues:
>
> - Use numpy.finfo instead of machar
>
> - Lots of nonzero/put/take combos, probably dates from Numeric.
>
>  I think these should be replaced with boolean array indexing, to
>  enhance readability and performance.
>
> - numpy.min/max -> amin/amax
>
>
> Minor stylistic issues in the code:
>
> - `catch_msg` is not used for anything
>
> - The original fortran documentation could be moved into the method
>  docstrings.
>
> - "if foo:", not "if (foo):"
>
> - if foo:
>     bar
>
>  not
>
>  if foo: bar
>
> - "return foo", not "return(foo)"
>
> - "# comment", not "## comment"
>
> - It's not necessary to "import types"; you can just use
>  isinstance(x, float)
>
>
> [clip]
> > I have written a nose-test compatible test.  Could someone look at it
> > and tell me if it meets the scipy style before I continue adding more
> > tests from Craig's test-suite for the C version of his program?
>
> The test style is OK; if it works with nosetests, it should be OK.
> Suggest using "import numpy as np", though, to stay in line with the rest
> of the code.
>
> --
> Pauli Virtanen
>
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20090510/dc08e674/attachment.html>