[SciPy-User] PCA for sparse matrices, tolerance of eigenvalues

Gael Varoquaux gael.varoquaux at normalesup.org
Thu Feb 24 08:14:23 EST 2011


On Thu, Feb 24, 2011 at 11:37:18AM +0000, Pauli Virtanen wrote:
> > First of all, is that a good way to go about it?

> If it doesn't work as a dense matrix, then you don't have much choice 
> than to rely on an iterative method. 'eigs' uses ARPACK.

For the application of truncated PCA, for which error on small
eigenvalues are not important, and very large data (much much larger than
the cache size) randomized projection methods can work increadibly well
(partly because they render the problem local in memory, and with large
data memory bandwidth is a killer). We have a fast truncated SVD in the
scikit learn that is fairly standalone, and can be extracted:
https://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/utils/extmath.py

Don't use this for other applications than PCA!

G



More information about the SciPy-User mailing list