[SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal

josef.pktd at gmail.com josef.pktd at gmail.com
Sun Jan 18 09:10:54 EST 2015


On Sun, Jan 18, 2015 at 8:51 AM, Alexander Grigorievskiy <
alex.grigorievskiy at gmail.com> wrote:

> Hello Scipy,
>
> I have been elaborating least-squares solvers recently I found out that
> the current implementation
> which is inside scipy.linalg.lstsq is not optimal. Of course, I may be
> wrong but here are my arguments:
>
> 1) Intrinsically it calls the LAPACK function "gelss" which uses SVD (as
> written in the LAPACK documentation).
> There are, however, two more routines which solve the same problem
> "gelsy" (uses complete orthogonal factorization)
> and "gelsd" (uses divide-and-conquer SVD algorithm). In the LAPACK
> documentation it is written that "gelsd" work faster then "gelss".
> http://www.netlib.org/lapack/lug/node27.html
>
> 2) In Numpy there is the same function "lstsq" and they call "gelsd" there.
>
> 3) I run my own tests analyzing the speed of four functions:
> "gelss" (the current one used in scipy.linalg.lstsq)
> "gelsy" (present in LAPACK uses complete orthogonal factorization)
> "gelsd" (present in LAPACK uses divide-and-conquer SVD)
> numpy.lstsq ( uses "gelsd" from lapack_lite)
>
> I have imported the missing functions (from LAPACK) in Scipy by
> including them into the file scipy/linalg/flapack.pyf.src,
> recompiling scipy, and creating almost the same function as "lstsq" but
> calling different LAPACK functions.
>
> The file with the test results is in the attachment.
> Based on this the fastest method is "gelsy" shortly followed by "gelsd".
> I want still to run couple more tests regarding how
> fast the methods scale if you change only one dimension of the matrix,
> or you can propose some other tests.
> I also monitored accuracies but they are almost the same, I can send the
> plot if anyone is interested.
>
> I would like to propose to change the current implementation to call
> "gelsy" or "gelsd" from LAPACK along with corresponding modifications.
>


In terms of backwards compatibility, it is also necessary to check the
behavior in bad cases, for example singular matrices and ill-conditioned
and near singular cases. For the latter the NIST cases would provide  a
good check.

I don't know how well the current lstsq is doing since I never checked.
Using scipy pinv (SVD based ?) for linear regression works pretty well in
those cases.

Josef



>
> Best Regards,
>     Alexander Grigorevskiy, PhD student, Aalto University.
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20150118/67d29034/attachment.html>


More information about the SciPy-Dev mailing list