[SciPy-User] PCA functions

josef.pktd at gmail.com josef.pktd at gmail.com
Wed May 19 11:44:20 EDT 2010


On Wed, May 19, 2010 at 11:31 AM, Zachary Pincus
<zachary.pincus at yale.edu> wrote:
> Hi Mike,
>
> Here's what I use. I don't think there's anything in scipy per se, but
> I might be wrong.

There is nothing directly in numpy/scipy but many packages have their own
version,
the most heavy duty version might be in MDP
http://mail.scipy.org/pipermail/nipy-devel/2009-December/002528.html

and the corresponding thread
http://mail.scipy.org/pipermail/nipy-devel/2009-December/002474.html

Josef

>
> Empirically, I find that doing the PCA with eigh is faster than with
> svd, but this might be based on the dimensionality of my data vs. the
> number of data points I use. The functions take in an (m,n)-shaped
> matrix of m data points in n dimensions, and return an array of shape
> (k,n) consisting of k principal components in n dimensions (where
> k=min(m,n)), a (k,)-shaped array of the variances of the data along
> each principal component, and a (m,k)-shaped array of the projection
> of each data point into the subspace spanned by the principal
> components.
>
> Zach
>
>
>
> import numpy
>
> def pca_svd(flat):
>   u, s, vt = numpy.linalg.svd(flat, full_matrices = 0)
>   pcs = vt
>   v = numpy.transpose(vt)
>   data_count = len(flat)
>   variances = s**2 / data_count
>   positions =  u * s
>   return pcs, variances, positions
>
> def pca_eig(flat):
>   values, vectors = _symm_eig(flat)
>   pcs = vectors.transpose()
>   variances = values / len(flat)
>   positions = numpy.dot(flat, vectors)
>   return pcs, variances, positions
>
> def _symm_eig(a):
>   """Given input a, return the non-zero eigenvectors and eigenvalues
> of the symmetric matrix a'a.
>
>   If a has more columns than rows, then that matrix will be rank-
> deficient,
>   and the non-zero eigenvalues and eigenvectors of a'a can be more
> easily extracted
>   from the matrix aa'. From the properties of the SVD:
>     if a of shape (m,n) has SVD u*s*v', then:
>       a'a = v*s's*v'
>       aa' = u*ss'*u'
>     let s_hat, an array of shape (m,n), be such that s * s_hat = I(m,m)
>     and s_hat * s = I(n,n). Thus, we can solve for u or v in terms of
> the other:
>       v = a'*u*s_hat'
>       u = a*v*s_hat
>   """
>   m, n = a.shape
>   if m >= n:
>     # just return the eigenvalues and eigenvectors of a'a
>     vecs, vals = _eigh(numpy.dot(a.transpose(), a))
>     vecs = numpy.where(vecs < 0, 0, vecs)
>     return vecs, vals
>   else:
>     # figure out the eigenvalues and vectors based on aa', which is
> smaller
>     sst_diag, u = _eigh(numpy.dot(a, a.transpose()))
>     # in case due to numerical instabilities we have sst_diag < 0
> anywhere,
>     # peg them to zero
>     sst_diag = numpy.where(sst_diag < 0, 0, sst_diag)
>     # now get the inverse square root of the diagonal, which will
> form the
>     # main diagonal of s_hat
>     err = numpy.seterr(divide='ignore', invalid='ignore')
>     s_hat_diag = 1/numpy.sqrt(sst_diag)
>     numpy.seterr(**err)
>     s_hat_diag = numpy.where(numpy.isfinite(s_hat_diag), s_hat_diag, 0)
>     # s_hat_diag is a list of length m, a'u is (n,m), so we can just
> use
>     # numpy's broadcasting instead of matrix multiplication, and only
> create
>     # the upper mxm block of a'u, since that's all we'll use anyway...
>     v = numpy.dot(a.transpose(), u[:,:m]) * s_hat_diag
>     return sst_diag, v
>
> def _eigh(m):
>   values, vectors = numpy.linalg.eigh(m)
>   order = numpy.flipud(values.argsort())
>   return values[order], vectors[:,order]
>
>
>
>
>
>
> On May 19, 2010, at 10:53 AM, Michael Hull wrote:
>
>> Hi Everybody,
>> I am doing some work using numpy/scipy and wanted to find the
>> principle components for some data. I can write a fairly simple
>> function to do this, but was wondering if there was already a function
>> in scipy to do this that I hadn't found before re-inventing the wheel
>>
>> Many thanks,
>>
>>
>> Mike Hull
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list