[scikit-learn] 1. Re: unclear help file for sklearn.decomposition.pca

Andreas Mueller t3kcit at gmail.com
Mon Oct 16 14:44:51 EDT 2017



On 10/16/2017 02:27 PM, Ismael Lemhadri wrote:
> @Andreas Muller:
> My references do not assume centering, e.g. 
> http://ufldl.stanford.edu/wiki/index.php/PCA
> any reference?
>
It kinda does but is not very clear about it:

This data has already been pre-processed so that each of the 
features\textstyle x_1and\textstyle x_2have about the same mean (zero) 
and variance.



Wikipedia is much clearer:
Consider a datamatrix 
<https://en.wikipedia.org/wiki/Matrix_%28mathematics%29>,*X*, with 
column-wise zeroempirical mean 
<https://en.wikipedia.org/wiki/Empirical_mean>(the sample mean of each 
column has been shifted to zero), where each of the/n/rows represents a 
different repetition of the experiment, and each of the/p/columns gives 
a particular kind of feature (say, the results from a particular sensor).
https://en.wikipedia.org/wiki/Principal_component_analysis#Details

I'm a bit surprised to find that ESL says "The SVD of the centered 
matrix X is another way of expressing the principal components of the 
variables in X",
so they assume scaling? They don't really have a great treatment of PCA, 
though.

Bishop <http://www.springer.com/us/book/9780387310732> and Murphy 
<https://mitpress.mit.edu/books/machine-learning-0> are pretty clear 
that they subtract the mean (or assume zero mean) but don't standardize.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171016/81b3014b/attachment.html>


More information about the scikit-learn mailing list