[Tutor] Help- Regarding python

Oscar Benjamin oscar.j.benjamin at gmail.com
Tue Feb 5 01:21:24 CET 2013


On 4 February 2013 06:24, Gayathri S <gayathri.s112 at gmail.com> wrote:
> Hi All....!
>             If i have data set like this means...
>
> 3626,5000,2918,5000,2353,2334,2642,1730,1687,1695,1717,1744,593,502,493,504,449,431,444,444,429,10
> 438,498,3626,3629,5000,2918,5000,2640,2334,2639,1696,1687,1695,1717,1744,592,502,493,504,449,431,444,441,429,10
> 439,498,3626,3629,5000,2918,5000,2633,2334,2645,1705,1686,1694,1719,1744,589,502,493,504,446,431,444,444,430,10
> 440,5000,3627,3628,5000,2919,3028,2346,2330,2638,1727,1684,1692,1714,1745,588,501,492,504,451,433,446,444,432,10
> 444,5021,3631,3634,5000,2919,5000,2626,2327,2638,1698,1680,1688,1709,1740,595,500,491,503,453,436,448,444,436,10
> 451,5025,3635,3639,5000,2920,3027,2620,2323,2632,1706,1673,1681,1703,753,595,499,491,502,457,440,453,454,442,20
> 458,5022,3640,3644,5000,2922,5000,2346,2321,2628,1688,1666,1674,1696,744,590,496.

PCA only makes sense for multivariate data: your data should be a set
of vectors *all of the same length*. I'll assume that you were just
being lazy when you posted it and that you didn't bother to copy the
first and last lines properly...

[snip]
>
> Shall i use the following code for doing PCA on given input? could you tell
> me?

This code you posted is all screwed up. It will give you errors if you
try to run it.

Also I don't really know what you mean by "doing PCA". The code below
transforms your data into PCA space and plots a 2D scatter plot using
the first two principal components.

#!/usr/bin/env python
import numpy as np
from matplotlib import pyplot as plt

data = np.array([
    [438,498,3626,3629,5000,2918,5000,2640,2334,2639,1696,1687,1695,1717,1744,592,502,493,504,449,431,444,441,429,10],
    [439,498,3626,3629,5000,2918,5000,2633,2334,2645,1705,1686,1694,1719,1744,589,502,493,504,446,431,444,444,430,10],
    [440,5000,3627,3628,5000,2919,3028,2346,2330,2638,1727,1684,1692,1714,1745,588,501,492,504,451,433,446,444,432,10],
    [444,5021,3631,3634,5000,2919,5000,2626,2327,2638,1698,1680,1688,1709,1740,595,500,491,503,453,436,448,444,436,10],
    [451,5025,3635,3639,5000,2920,3027,2620,2323,2632,1706,1673,1681,1703,753,595,499,491,502,457,440,453,454,442,20],
])

# Compute the eigenvalues and vectors of the covariance matrix
C = np.cov(data.T)
eigenvalues, eigenvectors = np.linalg.eig(C)

# 2D PCA - get the two eigenvectors with the largest eigenvalues
v1, v2 = eigenvectors[:,:2].T
# Project the data onto the two principal components
data_pc1 = [np.dot(v1, d) for d in data]
data_pc2 = [np.dot(v2, d) for d in data]

# Scatter plot in PCA space
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(data_pc1, data_pc2, 'x')
ax.set_xlabel(r'$PC_1$')
ax.set_ylabel(r'$PC_2$')
ax.legend(['data'])
plt.show()


Oscar


More information about the Tutor mailing list