[SciPy-User] PCA functions
josef.pktd at gmail.com
josef.pktd at gmail.com
Wed May 19 11:44:20 EDT 2010
On Wed, May 19, 2010 at 11:31 AM, Zachary Pincus
<zachary.pincus at yale.edu> wrote:
> Hi Mike,
>
> Here's what I use. I don't think there's anything in scipy per se, but
> I might be wrong.
There is nothing directly in numpy/scipy but many packages have their own
version,
the most heavy duty version might be in MDP
http://mail.scipy.org/pipermail/nipy-devel/2009-December/002528.html
and the corresponding thread
http://mail.scipy.org/pipermail/nipy-devel/2009-December/002474.html
Josef
>
> Empirically, I find that doing the PCA with eigh is faster than with
> svd, but this might be based on the dimensionality of my data vs. the
> number of data points I use. The functions take in an (m,n)-shaped
> matrix of m data points in n dimensions, and return an array of shape
> (k,n) consisting of k principal components in n dimensions (where
> k=min(m,n)), a (k,)-shaped array of the variances of the data along
> each principal component, and a (m,k)-shaped array of the projection
> of each data point into the subspace spanned by the principal
> components.
>
> Zach
>
>
>
> import numpy
>
> def pca_svd(flat):
> u, s, vt = numpy.linalg.svd(flat, full_matrices = 0)
> pcs = vt
> v = numpy.transpose(vt)
> data_count = len(flat)
> variances = s**2 / data_count
> positions = u * s
> return pcs, variances, positions
>
> def pca_eig(flat):
> values, vectors = _symm_eig(flat)
> pcs = vectors.transpose()
> variances = values / len(flat)
> positions = numpy.dot(flat, vectors)
> return pcs, variances, positions
>
> def _symm_eig(a):
> """Given input a, return the non-zero eigenvectors and eigenvalues
> of the symmetric matrix a'a.
>
> If a has more columns than rows, then that matrix will be rank-
> deficient,
> and the non-zero eigenvalues and eigenvectors of a'a can be more
> easily extracted
> from the matrix aa'. From the properties of the SVD:
> if a of shape (m,n) has SVD u*s*v', then:
> a'a = v*s's*v'
> aa' = u*ss'*u'
> let s_hat, an array of shape (m,n), be such that s * s_hat = I(m,m)
> and s_hat * s = I(n,n). Thus, we can solve for u or v in terms of
> the other:
> v = a'*u*s_hat'
> u = a*v*s_hat
> """
> m, n = a.shape
> if m >= n:
> # just return the eigenvalues and eigenvectors of a'a
> vecs, vals = _eigh(numpy.dot(a.transpose(), a))
> vecs = numpy.where(vecs < 0, 0, vecs)
> return vecs, vals
> else:
> # figure out the eigenvalues and vectors based on aa', which is
> smaller
> sst_diag, u = _eigh(numpy.dot(a, a.transpose()))
> # in case due to numerical instabilities we have sst_diag < 0
> anywhere,
> # peg them to zero
> sst_diag = numpy.where(sst_diag < 0, 0, sst_diag)
> # now get the inverse square root of the diagonal, which will
> form the
> # main diagonal of s_hat
> err = numpy.seterr(divide='ignore', invalid='ignore')
> s_hat_diag = 1/numpy.sqrt(sst_diag)
> numpy.seterr(**err)
> s_hat_diag = numpy.where(numpy.isfinite(s_hat_diag), s_hat_diag, 0)
> # s_hat_diag is a list of length m, a'u is (n,m), so we can just
> use
> # numpy's broadcasting instead of matrix multiplication, and only
> create
> # the upper mxm block of a'u, since that's all we'll use anyway...
> v = numpy.dot(a.transpose(), u[:,:m]) * s_hat_diag
> return sst_diag, v
>
> def _eigh(m):
> values, vectors = numpy.linalg.eigh(m)
> order = numpy.flipud(values.argsort())
> return values[order], vectors[:,order]
>
>
>
>
>
>
> On May 19, 2010, at 10:53 AM, Michael Hull wrote:
>
>> Hi Everybody,
>> I am doing some work using numpy/scipy and wanted to find the
>> principle components for some data. I can write a fairly simple
>> function to do this, but was wondering if there was already a function
>> in scipy to do this that I hadn't found before re-inventing the wheel
>>
>> Many thanks,
>>
>>
>> Mike Hull
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
More information about the SciPy-User
mailing list