[SciPy-user] getting the "best" two eigenvectors for a PCA analysis with a power method

Robert Kern rkern at ucsd.edu
Wed Aug 31 10:40:41 EDT 2005


Noel O'Boyle wrote:
> R calculates the PCs using singular value decomposition, instead of
> using the eigenvalues of the covariance matrix. From the R manual for
> prcomp:
> 
> "The calculation is done by a singular value decomposition of the
>      (centered and possibly scaled) data matrix, not by using 'eigen'
>      on the covariance matrix.  This is generally the preferred method
>      for numerical accuracy."
> 
> (1) Can anyone comment on what numerical accuracy means in this context,
> and whether one should really care about this for principal component
> analysis?

Forming the covariance matrix necessarily involves more floating point
operations in which error accumulates (not to mention an unnecessary
division). It may also yield a covariance matrix whose elements have
very large differences in magnitude (indeed, that's essentially the
entire point of PCA), but ideally you want to push that occurrence as
late as possible in the process.

Yes, you should probably care.

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
 Are the graves of dreams allowed to die."
  -- Richard Harter




More information about the SciPy-User mailing list