Linear regression in NumPy

Robert Kern robert.kern at gmail.com
Fri Mar 17 16:07:57 EST 2006


nikie wrote:
> I'm a little bit stuck with NumPy here, and neither the docs nor
> trial&error seems to lead me anywhere:
> I've got a set of data points (x/y-coordinates) and want to fit a
> straight line through them, using LMSE linear regression. Simple
> enough. I thought instead of looking up the formulas I'd just see if
> there isn't a NumPy function that does exactly this. What I found was
> "linear_least_squares", but I can't figure out what kind of parameters
> it expects: I tried passing it my array of X-coordinates and the array
> of Y-coordinates, but it complains that the first parameter should be
> two-dimensional. But well, my data is 1d. I guess I could pack the X/Y
> coordinates into one 2d-array, but then, what do I do with the second
> parameter?
> 
> Mor generally: Is there any kind of documentation that tells me what
> the functions in NumPy do, and what parameters they expect, how to call
> them, etc. All I found was:
> "This function returns the least-squares solution of an overdetermined
> system of linear equations. An optional third argument indicates the
> cutoff for the range of singular values (defaults to 10-10). There are
> four return values: the least-squares solution itself, the sum of the
> squared residuals (i.e. the quantity minimized by the solution), the
> rank of the matrix a, and the singular values of a in descending
> order."
> It doesn't even mention what the parameters "a" and "b" are for...

Look at the docstring. (Note: I am using the current version of numpy from SVN,
you may be using an older version of Numeric. http://numeric.scipy.org/)

In [171]: numpy.linalg.lstsq?
Type:           function
Base Class:     <type 'function'>
String Form:    <function linear_least_squares at 0x1677630>
Namespace:      Interactive
File:
/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/numpy-0.9.6.2148-py2.4-macosx-10.4-ppc.egg/numpy/linalg/linalg.py
Definition:     numpy.linalg.lstsq(a, b, rcond=1e-10)
Docstring:
    returns x,resids,rank,s
    where x minimizes 2-norm(|b - Ax|)
          resids is the sum square residuals
          rank is the rank of A
          s is the rank of the singular values of A in descending order

    If b is a matrix then x is also a matrix with corresponding columns.
    If the rank of A is less than the number of columns of A or greater than
    the number of rows, then residuals will be returned as an empty array
    otherwise resids = sum((b-dot(A,x)**2).
    Singular values less than s[0]*rcond are treated as zero.

-- 
Robert Kern
robert.kern at gmail.com

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco




More information about the Python-list mailing list