[SciPy-Dev] Slight change in lstsq accuracy for 0.17 - expected?

Wed Mar 16 21:16:58 EDT 2016

Hi,

On Wed, Mar 16, 2016 at 5:48 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Wed, Mar 16, 2016 at 6:41 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>>
>>
>> On Wed, Mar 16, 2016 at 6:05 PM, Matthew Brett <matthew.brett at gmail.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> On Sun, Mar 13, 2016 at 3:43 AM, Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>> >
>>> >
>>> > On Sat, Mar 12, 2016 at 6:16 AM, Matthew Brett
>>> > <matthew.brett at gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I just noticed that our (neuroimaging) test suite fails for scipy
>>> >> 0.17.0:
>>> >>
>>> >> https://travis-ci.org/matthew-brett/nipy/jobs/115471563
>>> >>
>>> >> This turns out to be due to small changes in the result of
>>> >> `scipy.linalg.lstsq` between 0.16.1 and 0.17.0 - on the same platform
>>> >> - the travis Ubuntu in this case.
>>> >>
>>> >> Is that expected?   I haven't investigated deeply, but I believe the
>>> >> 0.16.1 result (back into the depths of time) is slightly more
>>> >> accurate.  Is that possible?
>>> >
>>> >
>>> > That's very well possible, because we changed the underlying LAPACK
>>> > routine
>>> > that's used:
>>> >
>>> > https://github.com/scipy/scipy/blob/master/doc/release/0.17.0-notes.rst#scipy-linalg-improvements.
>>> > This PR also had some lstsq precision issues, so could be useful:
>>> > https://github.com/scipy/scipy/pull/5550
>>> >
>>> > What is the accuracy difference in your case?
>>>
>>> Damn - I thought you might ask me that!
>>>
>>> The test where it showed up comes from some calculations on a simple
>>> regression problem.
>>>
>>> The regression model is $y = X B + e$, with ordinary least squares fit
>>> given by $B = X^+ y$ where $X^+$ is the pseudoinverse of matrix X.
>>>
>>> For my particular X, $\Sigma e^2$ (SSE) is smaller using scipy 0.16
>>> pinv, compared to scipy 0.17 pinv:
>>>
>>> # SSE using high-precision sympy pinv
>>> 72.0232781366615313
>>> # SSE using scipy 0.16 pinv
>>> 72.0232781366625261
>>> # SSE using scipy 0.17 pinv
>>> 72.0232781390480170
>>
>>
>> That difference looks pretty big, looks almost like a cutoff effect. What
>> is the matrix X, in particular, what is its condition number or singular
>> values? Might also be interesting to see residuals for the different
>> functions.
>>
>> <snip>
>>
>
> Note that this might be an indication that the result of the calculation has
> an intrinsically large error bar.

In [6]: np.linalg.svd(X, compute_uv=False)
Out[6]:
array([  6.97301629e+07,   5.78702921e+05,   5.39646908e+05,
         2.99383626e+05,   3.24329208e+04,   1.29520746e+02,
         3.16197688e+01,   2.66011711e+00,   2.18287258e-01,
         1.60105276e-01,   1.11807228e-01,   2.12694535e-03,
         1.72675581e-03,   9.51576795e-04])

In [7]: np.linalg.cond(X)
Out[7]: 73278544935.300507

I wasn't sure what you meant about residuals from different functions
- can you explain more?

Cheers,

Matthew