PCA principal component analysis

Alexander Schmolck a.schmolck at gmx.net
Wed Apr 9 13:09:57 EDT 2003


s.thuriez at laposte.net (sebastien) writes:

> Hi,
> 
> Is there any PCA analysis tools for python ?

What is the analysis tool supposed to do?

Maybe this will do what you want, once you (downloaded and installed Numeric):

# Warning: hackish and not properly tested ripped out bit of code ahead
# so no guarantees whatsoever
# Anyway, it should at lesat sort of give you the idea
# try pca(X); if that doesn't do what you want try pca(t(X))

from Numeric import take, dot, shape, argsort, where, sqrt, transpose as t
from LinearAlgebra import eigenvectors

def pca(M):
    "Perform PCA on M, return eigenvectors and eigenvalues, sorted."
    T, N = shape(M)
    # if there are less rows T than columns N, use
    # snapshot method
    if T < N:
        C = dot(M, t(M))
        evals, evecsC = eigenvectors(C)
        # HACK: make sure evals are all positive
        evals = where(evals < 0, 0, evals)
        evecs = 1./sqrt(evals) * dot(t(M), t(evecsC))
    else:
        # calculate covariance matrix
        K = 1./T * dot(t(M), M)
        evals, evecs = eigenvectors(K)
    # sort the eigenvalues and eigenvectors, decending order
    order = (argsort(evals)[::-1])
    evecs = take(evecs, order, 1)
    evals = take(evals, order)
    return evals, t(evecs)


You can download Numeric and use it to compute the eigenvalues and
eigenvectors of an array.


> If it does, do you have any idea on how well it would scale ?

It should scale fine. If you experience speed problems, configure Numeric 23
with ATLAS support (you have to install ATLAS and LAPACK first, of course).
For large matrices, this should be *much* faster than handwritten C code that
doesn't use ATLAS.

> 
> I have already seen PyClimate (but it is not available for Windows
> which will be one of the target). Is there some LAPACK like packages ?

Yes, Numeric and scipy. (www.numpy.org, www.scipy.org, I should think)

'as




More information about the Python-list mailing list