[Numpy-discussion] performance matrix multiplication vs. matlab

Gael Varoquaux gael.varoquaux at normalesup.org
Mon Jun 8 01:32:10 EDT 2009


On Mon, Jun 08, 2009 at 12:29:08AM -0400, David Warde-Farley wrote:
> On 7-Jun-09, at 6:12 AM, Gael Varoquaux wrote:

> > Well, I do bootstrapping of PCAs, that is SVDs. I can tell you, it  
> > makes
> > a big difference, especially since I have 8 cores.

> Just curious Gael: how many PC's are you retaining? Have you tried  
> iterative methods (i.e. the EM algorithm for PCA)?

I am using the heuristic exposed in
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4562996

We have very noisy and long time series. My experience is that most
model-based heuristics for choosing the number of PCs retained give us
way too much on this problem (they simply keep diverging if I add noise
at the end of the time series). The algorithm we use gives us ~50
interesting PCs (each composed of 50 000 dimensions). That happens to be
quite right based on our experience with the signal. However, being
fairly new to statistics, I am not aware of the EM algorithm that you
mention. I'd be interested in a reference, to see if I can use that
algorithm. The PCA bootstrap is time-consuming.

Thanks,

Gaël 



More information about the NumPy-Discussion mailing list