[SciPy-Dev] Slight change in lstsq accuracy for 0.17 - expected?

Sat Mar 19 15:31:32 EDT 2016

On Fri, Mar 18, 2016 at 1:19 PM,  <josef.pktd at gmail.com> wrote:
>
>
> On Fri, Mar 18, 2016 at 4:09 PM, <josef.pktd at gmail.com> wrote:
>>
>>
>>
>> On Fri, Mar 18, 2016 at 3:56 PM, Matthew Brett <matthew.brett at gmail.com>
>> wrote:
>>>
>>> On Fri, Mar 18, 2016 at 12:19 PM,  <josef.pktd at gmail.com> wrote:
>>> >
>>> >
>>> > On Fri, Mar 18, 2016 at 3:07 PM, Matthew Brett
>>> > <matthew.brett at gmail.com>
>>> > wrote:
>>> >>
>>> >> On Wed, Mar 16, 2016 at 8:49 PM, Charles R Harris
>>> >> <charlesr.harris at gmail.com> wrote:
>>> >> >
>>> >> >
>>> >> > On Wed, Mar 16, 2016 at 9:39 PM, <josef.pktd at gmail.com> wrote:
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Wed, Mar 16, 2016 at 11:01 PM, Charles R Harris
>>> >> >> <charlesr.harris at gmail.com> wrote:
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> On Wed, Mar 16, 2016 at 7:16 PM, Matthew Brett
>>> >> >>> <matthew.brett at gmail.com>
>>> >> >>> wrote:
>>> >> >>>>
>>> >> >>>> Hi,
>>> >> >>>>
>>> >> >>>> On Wed, Mar 16, 2016 at 5:48 PM, Charles R Harris
>>> >> >>>> <charlesr.harris at gmail.com> wrote:
>>> >> >>>> >
>>> >> >>>> >
>>> >> >>>> > On Wed, Mar 16, 2016 at 6:41 PM, Charles R Harris
>>> >> >>>> > <charlesr.harris at gmail.com> wrote:
>>> >> >>>> >>
>>> >> >>>> >>
>>> >> >>>> >>
>>> >> >>>> >> On Wed, Mar 16, 2016 at 6:05 PM, Matthew Brett
>>> >> >>>> >> <matthew.brett at gmail.com>
>>> >> >>>> >> wrote:
>>> >> >>>> >>>
>>> >> >>>> >>> Hi,
>>> >> >>>> >>>
>>> >> >>>> >>> On Sun, Mar 13, 2016 at 3:43 AM, Ralf Gommers
>>> >> >>>> >>> <ralf.gommers at gmail.com>
>>> >> >>>> >>> wrote:
>>> >> >>>> >>> >
>>> >> >>>> >>> >
>>> >> >>>> >>> > On Sat, Mar 12, 2016 at 6:16 AM, Matthew Brett
>>> >> >>>> >>> > <matthew.brett at gmail.com>
>>> >> >>>> >>> > wrote:
>>> >> >>>> >>> >>
>>> >> >>>> >>> >> Hi,
>>> >> >>>> >>> >>
>>> >> >>>> >>> >> I just noticed that our (neuroimaging) test suite fails
>>> >> >>>> >>> >> for
>>> >> >>>> >>> >> scipy
>>> >> >>>> >>> >> 0.17.0:
>>> >> >>>> >>> >>
>>> >> >>>> >>> >> https://travis-ci.org/matthew-brett/nipy/jobs/115471563
>>> >> >>>> >>> >>
>>> >> >>>> >>> >> This turns out to be due to small changes in the result of
>>> >> >>>> >>> >> `scipy.linalg.lstsq` between 0.16.1 and 0.17.0 - on the
>>> >> >>>> >>> >> same
>>> >> >>>> >>> >> platform
>>> >> >>>> >>> >> - the travis Ubuntu in this case.
>>> >> >>>> >>> >>
>>> >> >>>> >>> >> Is that expected?   I haven't investigated deeply, but I
>>> >> >>>> >>> >> believe
>>> >> >>>> >>> >> the
>>> >> >>>> >>> >> 0.16.1 result (back into the depths of time) is slightly
>>> >> >>>> >>> >> more
>>> >> >>>> >>> >> accurate.  Is that possible?
>>> >> >>>> >>> >
>>> >> >>>> >>> >
>>> >> >>>> >>> > That's very well possible, because we changed the
>>> >> >>>> >>> > underlying
>>> >> >>>> >>> > LAPACK
>>> >> >>>> >>> > routine
>>> >> >>>> >>> > that's used:
>>> >> >>>> >>> >
>>> >> >>>> >>> >
>>> >> >>>> >>> >
>>> >> >>>> >>> >
>>> >> >>>> >>> > https://github.com/scipy/scipy/blob/master/doc/release/0.17.0-notes.rst#scipy-linalg-improvements.
>>> >> >>>> >>> > This PR also had some lstsq precision issues, so could be
>>> >> >>>> >>> > useful:
>>> >> >>>> >>> > https://github.com/scipy/scipy/pull/5550
>>> >> >>>> >>> >
>>> >> >>>> >>> > What is the accuracy difference in your case?
>>> >> >>>> >>>
>>> >> >>>> >>> Damn - I thought you might ask me that!
>>> >> >>>> >>>
>>> >> >>>> >>> The test where it showed up comes from some calculations on a
>>> >> >>>> >>> simple
>>> >> >>>> >>> regression problem.
>>> >> >>>> >>>
>>> >> >>>> >>> The regression model is $y = X B + e$, with ordinary least
>>> >> >>>> >>> squares
>>> >> >>>> >>> fit
>>> >> >>>> >>> given by $B = X^+ y$ where $X^+$ is the pseudoinverse of
>>> >> >>>> >>> matrix
>>> >> >>>> >>> X.
>>> >> >>>> >>>
>>> >> >>>> >>> For my particular X, $\Sigma e^2$ (SSE) is smaller using
>>> >> >>>> >>> scipy
>>> >> >>>> >>> 0.16
>>> >> >>>> >>> pinv, compared to scipy 0.17 pinv:
>>> >> >>>> >>>
>>> >> >>>> >>> # SSE using high-precision sympy pinv
>>> >> >>>> >>> 72.0232781366615313
>>> >> >>>> >>> # SSE using scipy 0.16 pinv
>>> >> >>>> >>> 72.0232781366625261
>>> >> >>>> >>> # SSE using scipy 0.17 pinv
>>> >> >>>> >>> 72.0232781390480170
>>> >> >>>> >>
>>> >> >>>> >>
>>> >> >>>> >> That difference looks pretty big, looks almost like a cutoff
>>> >> >>>> >> effect.
>>> >> >>>> >> What
>>> >> >>>> >> is the matrix X, in particular, what is its condition number
>>> >> >>>> >> or
>>> >> >>>> >> singular
>>> >> >>>> >> values? Might also be interesting to see residuals for the
>>> >> >>>> >> different
>>> >> >>>> >> functions.
>>> >> >>>> >>
>>> >> >>>> >> <snip>
>>> >> >>>> >>
>>> >> >>>> >
>>> >> >>>> > Note that this might be an indication that the result of the
>>> >> >>>> > calculation has
>>> >> >>>> > an intrinsically large error bar.
>>> >> >>>>
>>> >> >>>> In [6]: np.linalg.svd(X, compute_uv=False)
>>> >> >>>> Out[6]:
>>> >> >>>> array([  6.97301629e+07,   5.78702921e+05,   5.39646908e+05,
>>> >> >>>>          2.99383626e+05,   3.24329208e+04,   1.29520746e+02,
>>> >> >>>>          3.16197688e+01,   2.66011711e+00,   2.18287258e-01,
>>> >> >>>>          1.60105276e-01,   1.11807228e-01,   2.12694535e-03,
>>> >> >>>>          1.72675581e-03,   9.51576795e-04])
>>> >> >>>>
>>> >> >>>> In [7]: np.linalg.cond(X)
>>> >> >>>> Out[7]: 73278544935.300507
>>> >> >>>
>>> >> >>>
>>> >> >>> Condition number is about 7e10, so you might expect to lose about
>>> >> >>> 10
>>> >> >>> digits of of accuracy just due to roundoff.  The covariance of the
>>> >> >>> fitting
>>> >> >>> error will be `inv(X.T, X) * error`, so you can look at the
>>> >> >>> diagonals
>>> >> >>> for
>>> >> >>> the variance of the fitted parameters just for kicks, taking
>>> >> >>> `error`
>>> >> >>> to be
>>> >> >>> roundoff error or measurement error. This is not to say that the
>>> >> >>> results
>>> >> >>> should not be more reproducible, but should be of interest for the
>>> >> >>> validity
>>> >> >>> of the solution. Note that I suspect that there may be a
>>> >> >>> difference
>>> >> >>> in the
>>> >> >>> default `rcond` used, or maybe the matrix is on the edge and a
>>> >> >>> small
>>> >> >>> change
>>> >> >>> in results changes the effective rank.  You can run with the
>>> >> >>> `return_rank=True` argument to check that. Looks like machine
>>> >> >>> precision is
>>> >> >>> used for the default value of rcond, so experimenting with that
>>> >> >>> probable
>>> >> >>> won't be informative, but you might want to try it anyway.
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> statsmodels is using numpy pinv for the linear model and it works
>>> >> >> in
>>> >> >> almost all cases very well *).
>>> >> >>
>>> >> >> Using pinv with singular (given the rcond threshold) X makes
>>> >> >> everything
>>> >> >> well defined and not subject to numerical noise because of the
>>> >> >> regularization induced by pinv.  Our covariance for the parameter
>>> >> >> estimates,
>>> >> >> based on pinv(X), is also not noisy but it's of reduced rank.
>>> >> >
>>> >> >
>>> >> > What I'm thinking is that changing the algorithm, as here, is like
>>> >> > introducing noise relative to the original, so even if the result is
>>> >> > perfectly valid, the fitted parameters might change significantly. A
>>> >> > change
>>> >> > in the 11-th digit wouldn't surprise me at all, depending on the
>>> >> > actual
>>> >> > X
>>> >> > and rhs.
>>> >>
>>> >> I believe the errors $e = y - X^+ y$ are in general robust to
>>> >> differences between valid pinv solutions, even in extreme cases:
>>> >>
>>> >> In [2]: print_scalar(sse(X, npl.pinv(X), y))
>>> >> 72.0232781366616024
>>> >>
>>> >> In [3]: print_scalar(sse(np.c_[X, X], npl.pinv(np.c_[X, X]), y))
>>> >> 72.0232781366939463
>>> >>
>>> >> In [4]: npl.cond(np.c_[X, X])
>>> >> Out[4]: 6.6247965335098062e+26
>>> >
>>> >
>>> >
>>> > I think this does not depend on the smallest eigenvalues which should
>>> > be
>>> > regularized away in cases like this.
>>> > The potential problem is with eigenvalues that are just a but too large
>>> > for
>>> > rcond to remove them but still have numerical problems.
>>> >
>>> > I think it's a similar issue about the threshold as the one that you
>>> > raised
>>> > some time ago with determining the matrix rank for near singular
>>> > matrices.
>>>
>>> My point was (I think you're agreeing) that the accuracy of the pinv
>>> does not depend in a simple way on the condition number of the input
>>> matrix.
>>
>>
>> That's not the point I wanted to make.
>> The condition number doesn't matter if there is an singular/eigenvalue
>> with rcond 1e-16 or 1e-30 or 1e-100.
>> However, the numerical accuracy might depend on the condition number when
>> there is a singular value with rcond of 2e-15 or 5e-15.
>> I didn't manage to find the default cond or rcond for scipy.linalg.pinv
>> and lstsq. The problem I had was with numpy's pinv.
>> I think there is a grey zone in rcond of eigenvalues that might be bad
>> numerically.
>>
>> (My only experience is with SVD based pinv.)
>
>
> In the singular values array that you showed earlier, you have three small
> eigenvalues, but their rcond are only around 1e-11.
> So from my experience with SVD pinv, there shouldn't be any problem with a
> pinv in this case.

So - is there any downside to recommending numpy.linalg.pinv over
scipy.linalg.pinv in general?

Cheers,

Matthew