[SciPy-User] linear algebra: quadratic forms without linalg.inv

Mon Nov 2 11:26:07 EST 2009

On Nov 2, 2009, at 10:38 AM, josef.pktd at gmail.com wrote:

> > snip
>
> It really depends on the application. From the applications I know,
> pca is used for dimension reduction, when there are way too many
> regressors to avoid overfitting. The most popular in econometrics
> might be
>
> Forecasting Using Principal Components from a Large Number of  
> Predictors
> # James H. Stock and Mark W. Watson
> # Journal of the American Statistical Association, Vol. 97, No. 460
> (Dec., 2002), pp. 1167-1179
>
> A similar problem exists in chemometrics with more regressors than
> observations (at least from the descriptions I read when reading about
> NIPALS).
>
> I don't think that compared to the big stochastic errors, numerical
> precision plays much of a role. When we have large estimation errors
> in small samples in statistics, we don't have to worry, for example,
> about 10e-15 precision when our sampling errors are 10e-1.
>
> Of course, there are other applications, and I'm working my way slowly
> through the numerical issues.
>
> Josef

Hi Josef,

I have a strong opinion about this, and I am almost certainly in the  
minority, but my feeling is this: once you have ill-conditioning all  
bets are off.

Once the problem is ill-conditioned, then there are an infinite number  
of solutions that match your data in a least-squares sense.  You are  
then required to say something further about how you want to pick a  
particular solution from among the infinite number of equivalent  
solutions.   SVD/PCA is a procedure to find the minimum-two-norm  
solution that fits the data.  The minimum two-norm solution is  
unique.  For the general case, SVD is the only method that has a  
proper theory.  There is no proper theory for anything else, PERIOD.

The only other useful thing one can say is that if you expect your  
solution to be sparse, then you can use the newly developed theory of  
compressed sensing (tao and candes).  This says that the minimum one- 
norm solution is best in a statistical sense and provides an algorithm  
to find it.  The difference between SVD and compressed sensing is that  
the former spreads the power out equally among the coefficients, while  
the latter picks the solution that maximizes the magnitude of some of  
the cofficients and sets others to zero (i.e. picks a sparse answer).

So if you're problem is ill-conditioned, then you are in trouble.   
Your only legitimate options are to you use the SVD to pick the  
minimum two-norm answer, or to use compressed sensing and pick the  
minimum one-norm answer.   Everything else is completely nonsense.

Cheers,
Souheil