[Numpy-discussion] lstsq functionality

Tue Jul 20 22:24:51 EDT 2010

On Tue, Jul 20, 2010 at 7:16 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Tue, Jul 20, 2010 at 10:12 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>> On Tue, Jul 20, 2010 at 6:35 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>> On Mon, Jul 19, 2010 at 10:08 PM, Charles R Harris
>>> <charlesr.harris at gmail.com> wrote:
>>>>
>>>>
>>>> On Mon, Jul 19, 2010 at 9:40 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>>>>
>>>>> On Mon, Jul 19, 2010 at 8:27 PM, Charles R Harris
>>>>> <charlesr.harris at gmail.com> wrote:
>>>>> >
>>>>> >
>>>>> > On Mon, Jul 19, 2010 at 9:02 PM, Keith Goodman <kwgoodman at gmail.com>
>>>>> > wrote:
>>>>> >>
>>>>> >> On Mon, Jul 19, 2010 at 6:53 PM, Joshua Holbrook
>>>>> >> <josh.holbrook at gmail.com> wrote:
>>>>> >> > On Mon, Jul 19, 2010 at 5:50 PM, Charles R Harris
>>>>> >> > <charlesr.harris at gmail.com> wrote:
>>>>> >> >> Hi All,
>>>>> >> >>
>>>>> >> >> I'm thinking about adding some functionality to lstsq because I find
>>>>> >> >> myself
>>>>> >> >> doing the same fixes over and over. List follows.
>>>>> >> >>
>>>>> >> >> Add weights so data points can be weighted.
>>>>> >> >> Use column scaling so condition numbers make more sense.
>>>>> >> >> Compute covariance approximation?
>>>>> >> >>
>>>>> >> >> Unfortunately, the last will require using svd since there no linear
>>>>> >> >> least
>>>>> >> >> squares routines in LAPACK that also compute the covariance, at
>>>>> >> >> least
>>>>> >> >> that
>>>>> >> >> google knows about.
>>>>> >> >>
>>>>> >> >> Thoughts?
>>>>> >> >
>>>>> >> > Maybe make 2 functions--one which implements 1 and 2, and another
>>>>> >> > which implements 3? I think weights is an excellent idea!
>>>>> >>
>>>>> >> I'd like a lstsq that did less, like not calculate the sum of squared
>>>>> >> residuals. That's useful in tight loops. So I also think having two
>>>>> >> lstsq makes sense. One as basic as possible; one with bells. How does
>>>>> >> scipy's lstsq fit into all this?
>>>>> >
>>>>> > I think the computation of the residues is cheap in lstsq. The
>>>>> > algolrithm
>>>>> > used starts by reducing the design matrix to bidiagonal from and reduces
>>>>> > the
>>>>> > rhs at the same time. In other words an mxn problem becomes a (n+1)xn
>>>>> > problem. That's why the summed square of residuals is available but not
>>>>> > the
>>>>> > individual residuals, after the reduction there is only one residual and
>>>>> > its
>>>>> > square it the residue.
>>>>>
>>>>> That does sound good. But it must take some time. There's indexing,
>>>>> array creation, if statement, summing:
>>>>>
>>>>>        if results['rank'] == n and m > n:
>>>>>            resids = sum((transpose(bstar)[n:,:])**2,
>>>>> axis=0).astype(result_t)
>>>>>
>>>>> Here are the timings after removing the sum of squared residuals:
>>>>>
>>>>> >> x = np.random.rand(1000,10)
>>>>> >> y = np.random.rand(1000)
>>>>> >> timeit np.linalg.lstsq(x, y)
>>>>> 1000 loops, best of 3: 369 us per loop
>>>>> >> timeit np.linalg.lstsq2(x, y)
>>>>> 1000 loops, best of 3: 344 us per loop
>>>>> >>
>>>>> >> x = np.random.rand(10,2)
>>>>> >> y = np.random.rand(10)
>>>>> >> timeit np.linalg.lstsq(x, y)
>>>>> 10000 loops, best of 3: 102 us per loop
>>>>> >> timeit np.linalg.lstsq2(x, y)
>>>>> 10000 loops, best of 3: 77 us per loop
>>>>> _
>>>>
>>>> Now that I've looked through the driver program I see that you are right.
>>>> Also, all the info needed for the covariance is almost available in the
>>>> LAPACK driver program. Hmm, it seems that maybe the best thing to do here is
>>>> to pull out the lapack_lite driver program, modify it, and make it a
>>>> standalone python module that links to either the lapack_lite programs or
>>>> the ATLAS versions.  But that is more work than just doing things in python
>>>> :-\ It does have the added advantage that all the work arrays can be handled
>>>> internally.
>>>
>>> While taking a quick look at lstsq to see what I could cut, this line
>>> caught my eye:
>>>
>>> bstar[:b.shape[0],:n_rhs] = b.copy()
>>
>> Even if bstar contained a view of b, the next line in lstsq would take
>> care of it:
>>
>> a, bstar = _fastCopyAndTranspose(t, a, bstar)
>>
>
> If a view is never taken, then what does the copy method actually do?
> A redundant copy?  The b matrix to *gelsd is in/out, so if the
> original b is not going to be written to then taking out the redundant
> copy seems like a good idea.

Good point. Looks like we can get rid of 2 copies! I didn't get rid of
the second copy but I did cut things down just to see what the timing
was like. I also threw out the ability to handle complex numbers (only
saves an if iscomplex statement):

>> x = np.random.rand(10,2)
>> y = np.random.rand(10)
>>
>> timeit np.linalg.lstsq(x, y)
10000 loops, best of 3: 99.6 us per loop
>> timeit np.linalg.lstsq3(x, y)
10000 loops, best of 3: 61 us per loop

Here's what it looks like (any other tweaks possible?):

import math
def lstsq3(a, b, rcond=-1):
    a, _ = _makearray(a)
    b, wrap = _makearray(b)
    is_1d = len(b.shape) == 1
    if is_1d:
        b = b[:, newaxis]
    _assertRank2(a, b)
    m  = a.shape[0]
    n  = a.shape[1]
    n_rhs = b.shape[1]
    ldb = max(n, m)
    if m != b.shape[0]:
        raise LinAlgError, 'Incompatible dimensions'
    t, result_t = _commonType(a, b)
    real_t = _linalgRealType(t)
    bstar = zeros((ldb, n_rhs), t)
    bstar[:b.shape[0],:n_rhs] = b
    a, bstar = _fastCopyAndTranspose(t, a, bstar)
    s = zeros((min(m, n),), real_t)
    nlvl = max( 0, int( math.log( float(min(m, n))/2. ) ) + 1 )
    iwork = zeros((3*min(m, n)*nlvl+11*min(m, n),), fortran_int)
    lapack_routine = lapack_lite.dgelsd
    lwork = 1
    work = zeros((lwork,), t)
    results = lapack_routine(m, n, n_rhs, a, m, bstar, ldb, s, rcond,
                             0, work, -1, iwork, 0)
    lwork = int(work[0])
    work = zeros((lwork,), t)
    results = lapack_routine(m, n, n_rhs, a, m, bstar, ldb, s, rcond,
                             0, work, lwork, iwork, 0)
    if results['info'] > 0:
        raise LinAlgError, 'SVD did not converge in Linear Least Squares'
    if is_1d:
        x = array(ravel(bstar)[:n], dtype=result_t, copy=True)
    else:
        x = array(transpose(bstar)[:n,:], dtype=result_t, copy=True)
    return wrap(x)

> Out of curiosity, is there an explicit way to check if these share memory?
>
> import numpy as np
> b = np.empty((1000,2))
> c = np.arange(1000)[:,None]
> b[:,1:] = c
>
> Then what?
>
> Skipper
>
>>> I thought that array.__setitem__ never gives a view of the right hand
>>> side. If that's so then the copy is not needed. That would save some
>>> time since b can be large. All unit test pass when I remove the copy,
>>> but you know how that goes...
>>>
>>> I also noticed that "import math" is done inside the lstsq function.
>>> Why is that?
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>