[SciPy-User] robust fit

Mon May 30 15:54:57 EDT 2011

On Mon, May 30, 2011 at 11:12 AM, Robert Kern <robert.kern at gmail.com> wrote:
> On Mon, May 30, 2011 at 05:58, Piter_ <x.piter at gmail.com> wrote:
>> Hi all.
>> Can anybody point me a direction how to make a robust fit of nonlinear
>> function using leastsq.
>> Maybe someone have seen ready function doing this.
>
> A variety of robust fits are implemented in statsmodels:
>
> http://statsmodels.sourceforge.net/trunk/rlm.html

Unfortunately, rlm includes only linear models, so unless the problem
can be converted to a linear in parameters problem RLM will not help
directly.

I don't know of anything that would be immediately available for this.

long answer:

>From a quick Google search it seems the same iteratively reweighted
regression method can be applied to non-linear models, but I didn't
find an internet accessible paper, and I don't know the details about
how easy it would be to add non-linear least squares instead of linear
weighted least squares to statsmodels.robust.
scipy.optimize.curvefit allows for weights, so it might be possible to
down-weight observations with large residuals in non-linear least
squares.

Some ways that shouldn't be too difficult to implement:

If there are clear outliers, it might be possible to identify them and
remove or trim them, trimmed least squares.

(Using optimize fmin with a robust loss function, might be possible,
but I have no idea how well it works or how to get estimates for the
covariance of the error estimates.)

Since I'm not so familiar with these robust methods but know Maximum
Likelihood estimation, what I would do is to assume that the errors
come from a non-normal distribution, either a mixture model, if some
observations might be generated by a different model, or assume that
the errors are t-distributed.
In the linear examples that I looked at, t-distributed maximum
likelihood was very robust to outliers, (error distribution with heavy
tails).
It should also work quite easily (using GenericLikelihoodModel in
statsmodels), if the non-linear model is well behaved and/or good
starting values are available.

I don't remember whether I have read this specific paper but it's top
in my google search http://www.jstor.org/stable/2290063 (Cited by 490
in google)

Robust Statistical Modeling Using the t Distribution
Kenneth L. Lange, Roderick J. A. Little and Jeremy M. G. Taylor

Abstract:
"The t distribution provides a useful extension of the normal for
statistical modeling of data sets involving errors with longer-
than-normal tails. An analytical strategy based on maximum likelihood
for a general model with multivariate t errors is suggested and
applied to a variety of problems, including linear and nonlinear
regression, robust estimation of the mean and covariance matrix with
missing data, unbalanced multivariate repeated-measures data,
multivariate modeling of pedigree data, and multi- variate nonlinear
regression. The degrees of freedom parameter of the t distribution
provides a convenient dimension for achieving robust statistical
inference, with moderate increases in computational complexity for
many models. Estimation of precision from asymptotic theory and the
bootstrap is discussed, and graphical methods for checking the
appropriateness of the t distribution are presented."

Josef

>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>   -- Umberto Eco
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>