[Numpy-discussion] Multiple Regression

josef.pktd at gmail.com josef.pktd at gmail.com
Thu Nov 12 20:20:35 EST 2009


On Thu, Nov 12, 2009 at 6:44 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Thu, Nov 12, 2009 at 17:38, Alexey Tigarev <alexey.tigarev at gmail.com> wrote:
>> Hi All!
>>
>> I have implemented multiple regression in a following way:
>>
>> def multipleRegression(x, y):
>>    """ Perform linear regression using least squares method.
>>
>>    X - matrix containing inputs for observations,
>>    y - vector containing one of outputs for every observation """
>>    mulregLogger.debug("multipleRegression(x=%s, y=%s)" % (x, y))
>>    xt = transpose(x)
>>    a = dot(xt, x)     # A = xt * x
>>    b = dot(xt, y)     # B = xt * y
>>    try:
>>        return linalg.solve(a, b)
>
> Never, ever use the normal equations. :-)
>
> Use linalg.lstsq(x, y) instead.
>
>>    except linalg.LinAlgError, lae:
>>        mulregLogger.warn("Singular matrix:\n%s" % (a))
>>        mulregLogger.warn(lae)
>>        mulregLogger.warn("Determinant: %f" % (linalg.det(a)))
>>        raise lae
>>
>> Can you suggest me something to optimize it?
>>
>> I am using it on large number of observations so it is common to have
>> "x" matrix of about 5000x20 and "y" vector of length 5000, and more.
>> I also have to run that multiple times for different "y" vectors and
>> same "x" matrix.
>
> Just make a matrix "y" such that each column vector is a different
> output vector (e.g. y.shape == (5000, number_of_different_y_vectors))

or if you want to do it sequentially, this should work

xpinv = linalg.pinv(x)

for y in all_ys:
   beta = np.dot(xpinv, y)

but this works for singular problems without warning

Josef

>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list