ScientificPython - LeastSquareFit diverges

konrad.hinsen at laposte.net konrad.hinsen at laposte.net
Tue Jul 18 17:44:30 EDT 2006


On 18.07.2006, at 15:59, Harold Fellermann wrote:

>>>> def powerlaw((a,b),x) :
> ...     return a*x**b

Fitting power laws is a tricky business, you need a pretty good  
initial guess to get convergence.

> Note that I could easily fit the above data using gnuplots internal
> fitting procedure. Any idea what is going wrong here? Is it a known
> problem? Are there any work arounds or other packages?

My suggestion is to fit, at least as a first step, the logarithms of  
your data points:

import Numeric as N

def powerlaw_log((a, b), x) :
     return N.log(a) + b*N.log(x)

params1, chisq = leastSquaresFit(powerlaw_log, (10., -3.),
                                 [(x, N.log(y)) for x, y, sigma in  
data])


You can then use those parameters as starting values for fitting your  
original problem:

params2, chisq = leastSquaresFit(powerlaw, params1, data)

Doing this for your data yields:

params1: [9469.9675999067185, -2.0881423620750521]

params2: [1591.4025775162165, -1.0112284948049179]

The big difference between the two fits is a further indicator for a  
stability problem. I would trust the first set more than the second one.

As a general rule, the model to be fitted should be a smoothly  
varying function of the parameters, and the same should be true for  
the derivatives.

The second general rule is never to trust a non-linear fit algorithm  
blindly. Look at your data first, see if the model can be a good fit,  
and play with some paramater values to get a feeling for how they  
influence the fit. Plotting your data set, it is immediately clear  
that the first point ruins any nice power law behaviour. You might  
thus prefer to do the fit without the first point, and you will get a  
much better defined exponent:

	params1: [31363.301954929859, -2.4047303053979046]
	params2: [182522.2346197216, -2.9893640209815757]

Plotting the models corresponding to these two sets together with the  
data, you will see that everything coincides well for large x values,  
meaning that the first two points make all the difference - another  
pointer towards a lack of stability in the fit.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: hinsen at cnrs-orleans.fr
---------------------------------------------------------------------





More information about the Python-list mailing list