[SciPy-Dev] Sketching-based Matrix Computations

Jordi Montes jomsdev at gmail.com
Fri Mar 24 13:23:25 EDT 2017


Short answer: Yes.

Long answer: Yes but not for all kind of matrices.

What I am proposing here is given a matrix A create a matrix S (using CWT)
with the following property with probability of at least 9/10:

||SAx||_2 = (1+-epsilon)||Ax||_2 for all x in R.

These matrices (S) are called subspace embeddings. SA can be computed in
O(nnz(A)) and the property

So basically it is an improvement for overconstrained least-squares
regression, low-rank approximation, approximation all leverage scores and
l_p-regession.

On top of that, implementing Blendenpik would remove the uncertainty (I
would use the result of CWT as a preconditioner). Blendenpik beats LAPACK's
direct dense least-squares solver by a large margin on essentially any
dense *tall *matrix.





On 23 March 2017 at 12:24, Matt Haberland <haberland at ucla.edu> wrote:

> Do I understand that this is intended to be a faster but otherwise drop-in
> replacement for scipy.linalg.lstsq or scipy.sparse.linalg.lsqr?
> I would definitely be interested in an alternative sparse least squares
> solver. I could use it.
>
> On Mar 21, 2017 11:09 PM, "Jordi Montes" <jomsdev at gmail.com> wrote:
>
>> *# I started this discussion on GitHub
>> <https://github.com/scipy/scipy/issues/7122>. This email is a digested
>> version of the conversation to check what people think about this on the
>> mailing list.*
>>
>>
>> TL;DR: *I think having an implementation for some randomize matrix
>> algorithms would be great. I would like to discuss if the community finds
>> it useful too.*
>>
>>
>> Hello,
>>
>> I am Jordi. I have been working on a xdata
>> <http://www.darpa.mil/program/xdata> open source project for the last
>> year called libSkylark <https://xdata-skylark.github.io/libskylark/>.
>> The library is suitable for general statistical data analysis and
>> optimization applications, but it is heavily focused on distributed systems.
>>
>> *Randomized matrix algorithms* have been a hot topic in research the
>> last years. Recent developments have shown their utility in large-scale
>> machine learning and statistical data analysis applications.
>>
>> Although many ML applications would take advantage of the implementation
>> of these algorithms the scope of them goes far beyond. I believe it
>> would be a good idea integrate some of them in Scipy and I would like to
>> know the opinion of others to check if it would be worth it.
>>
>> To my eyes, Randomized Numerical Linear Algebra is big/useful enough to
>> has its submodule. However, this is something that the community has to
>> decide.
>>
>> Meanwhile, to demonstrate that these methods can be useful I would
>> implement them under, scipy.linalg where the scipy linear algebra
>> functions live. The idea is to bring value since the first moment. I will
>> implement Blendenpik <http://epubs.siam.org/doi/10.1137/090767911>, a least-squares
>> solver   based on these techniques that outperforms LAPACK by
>> significant factors and scales significantly better than any QR-based
>> solver.
>>
>> The proposal is:
>>
>>    -
>>
>>    Implement CWT
>>    <http://delivery.acm.org/10.1145/2490000/2488620/p81-clarkson.pdf?ip=198.4.83.52&id=2488620&acc=ACTIVE%20SERVICE&key=90259622823F2C32%2E66F8B2C43167A3DD%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=736713469&CFTOKEN=50269727&__acm__=1489010918_f1abc3d5c4650844f4503dc1a27484ad>
>>    . CWT lets us create a matrix *S* such as *SA* is shorter than *A
>>    but conservating some properties*.
>>    -
>>
>>    Compute *QR = SA* using scipy.
>>    -
>>
>>    From here, there are two potential features:
>>    -
>>
>>       Implement Blendenpik <http://epubs.siam.org/doi/10.1137/090767911> for
>>       solving Linear Square problems faster without losing any accuracy.
>>       -
>>
>>       Compute the squared row norms to obtain leverage scores.
>>
>> Would be better to start with Blendenpik, because it solves a more
>> common problem by far.
>>
>> I am thinking about doing it for sparse matrices first because CWT can
>> take advantage of it. However, I wonder if the QR method on script takes
>> advantage of sparsity too.
>>
>> I have already set up the development environment so I could start as
>> soon as we decide what to do. As I said, I am open to discussion, so anyone
>> who could be interested is welcome.
>>
>> Thank you,
>>
>> Jordi.
>>
>>
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> https://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20170324/33d076bb/attachment.html>


More information about the SciPy-Dev mailing list