[SciPy-User] a routine for fitting a straight line to data

Warren Weckesser warren.weckesser at gmail.com
Sun Mar 24 20:22:55 EDT 2013


On 3/24/13, David Pine <djpine at gmail.com> wrote:
> I would like to submit a routine to scipy for performing least squares
> fitting of a straight line f(x) = ax + b to an (x,y) data set.  There are a
> number of ways of doing this currently using scipy or numpy but all have
> serious drawbacks.  Here is what is currently available, as far as I can
> tell, and what seem to me to be their drawbacks.
>
> 1. numpy.polyfit :
>     a.  It is slower than it needs to be.  polyfit uses matrix methods that
> are needed to find best fits to general polynomials (quadratic, cubic,
> quartic, and higher orders), but matrix methods are overkill when you just
> want to fit a straight line f(x)= ax + b to data set.  A direct approach can
> yield fits significantly faster.
>     b.  polyfit currently does not allow using absolute error estimates for
> weighting the data; only relative error estimates are currently possible.
> This can be fixed, but for the moment it's a problem.
>     c.  New or inexperienced uses are unlikely to look for a routine to fit
> straight lines in a routine that is advertised as being for polynomials.
> This is a more important point than it may seem.  Fitting data to a straight
> line is probably the most common curve fitting task performed, and the only
> one that many users will ever use.  It makes sense to cater to such users by
> providing them with a routine that does what they want in as clear and
> straightforward a manner as possible.  I am a physics professor and have
> seen the confusion first hand with a wide spectrum of students who are new
> to Python.  It should not be this hard for them.
>
> 2. scipy.linalg.lstsq
>     a.  Using linalg.lstsq to fit a straight line is clunky and very slow
> (on the order of 10 times slower than polyfit, which is already slower than
> it needs to be).
>     b.  While linalg.lstsq can be used to fit data with error estimates
> (i.e. using weighting), how to do this is far from obvious.  It's unlikely
> that anyone but an expert would figure out how to do it.
>     c.  linalg.lstsq requires the use of matrices, which will be unfamiliar
> to some users.  Moreover, it should not be necessary to use matrices when
> the task at hand only involves one-dimensional arrays.
>
> 3. scipy.curve_fit
>     a.  This is a nonlinear fitting routine.  As such, it searches for the
> global minimum in the objective function (chi-squared) rather than just
> calculating where the global minimum is using the analytical expressions for
> the best fits.  It's the wrong method for the problem, although it will
> work.
>
> Questions:  What do others in the scientific Python community think about
> the need for such a routine?   Where should routine to fit data to a
> straight line go?  It would seem to me that it should go in the
> scipy.optimize package, but I wonder what others think.
>
> David Pine


David,

There is also scipy.stats.linregress, which is a basic 1-D (ie. x and
y are 1-D vectors) linear regression.

Warren

> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list