[scikit-learn] Finding the PC that captures a specific variable

Fri Jan 22 15:48:46 EST 2021

Hi
Thanks for the replies. I read about the available functions in the
PCA section. Consider the following code

x = StandardScaler().fit_transform(x)
pca = PCA()
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents)
loadings = pca.components_
finalDf = pd.concat([principalDf, pd.DataFrame(targets, columns=['kernel'])], 1)
print( "First and second observations\n", finalDf.loc[0:1] )
print( "loadings[0:1]\n", loadings[0], loadings[1] )
print ("explained_variance_ratio_\n",pca.explained_variance_ratio_)

The output looks like

First and second observations
0 1 2 3 4 kernel
0 2.959846 -0.184307 -0.100236 0.533735 -0.002227 ELEC1
1 0.390313 1.805239 0.029688 -0.502359 -0.002350 ELECT2
loadings[0:1]
[0.21808984 0.49137412 0.46511098 0.49735819 0.49728754] [-0.94878375
-0.01257726 0.29718078 0.07493325 0.07562934]
explained_variance_ratio_
[7.80626876e-01 1.79854061e-01 2.50729844e-02 1.44436687e-02 2.40984767e-06]

As you can see for two kernels named ELEC1 and ELEC2, there are five
PCs from 0 to 4.
Now based on the numbers in the loadings, I expect that loadings[0]
which is the first variable is better shown on PC1-PC2 plane
(0.49137412,0.46511098). However, loadings[1] which is the second
variable is better shown on PC0-PC2 plane (-0.94878375,0.29718078).
Is this understanding correct?

I don't understand what explained_variance_ratio_ is trying to say here.

Regards,
Mahmood

On Fri, Jan 22, 2021 at 11:52 AM Nicolas Hug <niourf at gmail.com> wrote:
>
> Hi Mahmood,
>
> There are different pieces of info that you can get from PCA:
>
> 1. How important is a given PC to reconstruct the entire dataset -> This
> is given by explained_variance_ratio_ as Guillaume suggested
>
> 2. What is the contribution of each feature to each PC (remember that a
> PC is a linear combination of all the features i.e.: PC_1 = X_1 .
> alpha_11 + X_2 . alpha_12 + ... X_m . alpha_1m). The alpha_ij are what
> you're looking for and they are given in the components_ matrix which is
> a n_components x n_features matrix.
>
> Nicolas
>
> On 1/22/21 9:13 AM, Mahmood Naderan wrote:
> > Hi
> > I have a question about PCA and that is, how we can determine, a
> > variable, X,  is better captured by which factor (principal
> > component)? For example, maybe one variable has low weight in the
> > first PC but has a higher weight in the fifth PC.
> >
> > When I use the PCA from Scikit, I have to manually work with the PCs,
> > therefore, I may miss the point that although a variable is weak in
> > PC1-PC2 plot, it may be strong in PC4-PC5 plot.
> >
> > Any comment on that?
> >
> > Regards,
> > Mahmood
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn