[Numpy-discussion] confusion about eigenvector

Thu Feb 28 08:37:00 EST 2008

On Thu, Feb 28, 2008 at 8:17 AM, devnew at gmail.com <devnew at gmail.com> wrote:
> i all
>  I am learning PCA method by reading up Turk&Petland papers etc
>  while trying out PCA on a set of greyscale images using python, and
>  numpy I tried to create eigenvectors and facespace.
>
>  i have
>   facesarray--- an NXP numpy.ndarray that contains data of images
>        N=numof images,P=pixels in an image
>  avgarray --1XP array containing avg value for each pixel
>   adjustedfaces=facesarray-avgarray
>  adjustedmatrix=matrix(adjustedfaces)
>  adjustedmatrix_trans=adjustedmatrix.transpose()
>  covariancematrix =adjustedmatrix*adjustedmatrix_trans
>  evalues,evect=eigh(covariancematrix)
>
>  after sorting such that most significant eigenvectors are selected.
>  evectmatrix is now my eigenvectors matrix
>
>  here is a sample using 4X3 greyscale images
>
>  evalues
>  [ -1.85852801e-13   6.31143639e+02   3.31182765e+03   5.29077871e+03]
>  evect
>  [[ 0.5        -0.06727772  0.6496399  -0.56871936]
>   [ 0.5        -0.77317718 -0.37697426  0.10043632]
>   [ 0.5         0.27108233  0.31014514  0.76179023]
>   [ 0.5         0.56937257 -0.58281078 -0.29350719]]
>
>  evectmatrix  (sorted according to largest evalue first)
>  [[-0.56871936  0.6496399  -0.06727772  0.5       ]
>   [ 0.10043632 -0.37697426 -0.77317718  0.5       ]
>   [ 0.76179023  0.31014514  0.27108233  0.5       ]
>   [-0.29350719 -0.58281078  0.56937257  0.5       ]]
>
>  then i can create facespace by
>  facespace=evectmat*adjustedfaces
>
>  till now i 've been following the steps as mentioned in the PCA
>  tutorial(by Lindsay smith & others)
>  what i want to know is that in the above evectmatrix is each row
>  ([-0.56871936  0.6496399  -0.06727772  0.5   ] etc)  an eigenvector?
>  or  does a column in the above matrix represent an eigenvector?

The eigenvectors are in columns. To ensure yourself, look at the last
constant column (of 0.5's) corresponding to the zero-eigenvalue. This
id due to the initial column centering.

>  to put it differently,
>  should i represent an eigenvector by
>  evectmatrix[i] or by
>  (get_column_i_of(evectmatrix)).transpose()
>
>  if someone can make this clear please do
>  D
>  _______________________________________________
>  Numpy-discussion mailing list
>  Numpy-discussion at scipy.org
>  http://projects.scipy.org/mailman/listinfo/numpy-discussion
>

BTW:

If your data is not extreme these simple steps would also result in
what you want (Not tested):

-------------
from scipy import linalg
facearray-=facearray.mean(0) #mean centering
u, s, vt = linalg.svd(facearray, 0)
scores = u*s
facespace = vt.T
# reconstruction: facearray ~= dot(scores, facespace.T)
explained_variance = 100*s.cumsum()/s.sum()

# here is how to reconstruct an `eigen-image` from the first component
# You may want to ensure this as it depends on how you created the facearray
face_image0 = facespace[:,0].reshape(4,3)

-----------

In case you have a large dataset (many pixels *and* many images) you
may look into using the arpack eigensolver for efficiency (located in
scikits and appearing in the upcomming release of scipy, 0.7)

Arnar