PCA principal component analysis

Giorgi lekishvili at python.qartu.com
Thu Apr 10 13:13:36 EDT 2003


Alexander Schmolck <a.schmolck at gmx.net> wrote in message news:<yfs7ka3iom2.fsf at black132.ex.ac.uk>...
> s.thuriez at laposte.net (sebastien) writes:
> 
> > Hi,
> > 
> > Is there any PCA analysis tools for python ?
> 
> What is the analysis tool supposed to do?
> 
> Maybe this will do what you want, once you (downloaded and installed Numeric):
> 
> # Warning: hackish and not properly tested ripped out bit of code ahead
> # so no guarantees whatsoever
> # Anyway, it should at lesat sort of give you the idea
> # try pca(X); if that doesn't do what you want try pca(t(X))
> 
> from Numeric import take, dot, shape, argsort, where, sqrt, transpose as t
> from LinearAlgebra import eigenvectors
> 
> def pca(M):
>     "Perform PCA on M, return eigenvectors and eigenvalues, sorted."
>     T, N = shape(M)
>     # if there are less rows T than columns N, use
>     # snapshot method
>     if T < N:
>         C = dot(M, t(M))
>         evals, evecsC = eigenvectors(C)
>         # HACK: make sure evals are all positive
>         evals = where(evals < 0, 0, evals)
>         evecs = 1./sqrt(evals) * dot(t(M), t(evecsC))
>     else:
>         # calculate covariance matrix
>         K = 1./T * dot(t(M), M)
>         evals, evecs = eigenvectors(K)
>     # sort the eigenvalues and eigenvectors, decending order
>     order = (argsort(evals)[::-1])
>     evecs = take(evecs, order, 1)
>     evals = take(evals, order)
>     return evals, t(evecs)
> 
> 

Nice, but if we have big matrices, than it might be discouraging to
comput all of the PCs once we shall drop most of them. Implementation
of NIPALS with Numeric is 10 min work.




> You can download Numeric and use it to compute the eigenvalues and
> eigenvectors of an array.
> 
> 
> > If it does, do you have any idea on how well it would scale ?
> 
> It should scale fine. If you experience speed problems, configure Numeric 23
> with ATLAS support (you have to install ATLAS and LAPACK first, of course).
> For large matrices, this should be *much* faster than handwritten C code that
> doesn't use ATLAS.
> 
> > 
> > I have already seen PyClimate (but it is not available for Windows
> > which will be one of the target). Is there some LAPACK like packages ?
> 
> Yes, Numeric and scipy. (www.numpy.org, www.scipy.org, I should think)
> 
> 'as




More information about the Python-list mailing list