[SciPy-User] linear algebra: quadratic forms without linalg.inv

Mon Nov 2 11:31:16 EST 2009

josef.pktd at gmail.com skrev:
> It really depends on the application. From the applications I know,
> pca is used for dimension reduction, when there are way too many
> regressors to avoid overfitting.

Too many regressors gives you one or more tiny singular values in the 
covariance matrix (X'X), which you use in:

   betas = (X'X)**-1 * X' * y

So the inverse of X'X is heavily influenced by one or more of these 
"singular values" that do not contribute significantly to X'X. That is 
obviously ridicilous, because we want the factors that determines X'X to 
determinate the inverse, (X'X)**-1, as well. I.e. we want the regressors 
(betas) we estimate to be determined by the same factors that determines 
X'X.

So we proceed by doing SVD on X'X and throw the offenders out. And in 
statistics, that is called "PCA". And small singular values in X'X is 
known as "multicolinearity".

When multicolinearity is present, numerical stability is the problem:

1 / s[i]  becomes infinite for s[i] == 0, and thus s[i] dominates 
(X'X)**-1 completely. But with s[i] == 0, s[i] does not even contribute 
to X'X. So it makes sence to edit too small s[i] values out, so that 
only the values of s[i] important for X'X is used to compute (X'X)**-1 
and betas. And that is what PCA does. Statistics textbooks usually don't 
teach this. They just say "multicolinearity is bad".

Yes PCA is used for "dimensionality reduction" and avoiding overfitting. 
But why is overfitting a problem anyway? And why does PCA help? This is 
actually all entagled. The main issue is alwys that 1/s[i] is big when 
s[i] is small. Overfitting gives you a lot of these big 1/s values. And 
now the betas you solved does not reflect the signal in X'X, so the 
model has no predictive power.

Sturla