[SciPy-User] Q: How to calculate correlation between columns of a matrix, without looping?
Raj
rajeev.raizada at gmail.com
Tue Mar 1 14:50:29 EST 2011
Dear SciPy users,
I have a matrix (or, more strictly speaking, an array),
and I want to calculate the correlation between each column
and every other column, i.e. to make a standard correlation matrix.
In Matlab, this is pretty straightforward:
>> m = [ 1 2; -1 3; 0 4]
m =
1 2
-1 3
0 4
>> corr(m,m)
ans =
1.0000 -0.5000
-0.5000 1.0000
However, getting this same behaviour out of SciPy/NumPy
is proving to be harder than I expected.
Below are some attempts, and the output that they give.
I also show the results of trying correlations on the transpose of m.
The closest to the desired output that I can get
is a weird 4x4 matrix made out of stacked copies
of the correct correlation matrix.
I could loop through the columns of the matrix,
and calculate each correlation separately,
but that seems like an ugly and inefficient workaround.
Any help or advice greatly appreciated,
Raj
-----------------
In [1]: import scipy
In [2]: import numpy
In [3]: m = scipy.array([[ 1, 2],[ -1, 3],[ 0, 4]])
In [4]: m
Out[4]:
array([[ 1, 2],
[-1, 3],
[ 0, 4]])
In [5]: numpy.corrcoef(m,m)
Out[5]:
array([[ 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1.]])
In [6]: m_t = scipy.transpose(m)
In [7]: m_t
Out[7]:
array([[ 1, -1, 0],
[ 2, 3, 4]])
In [8]: numpy.corrcoef(m_t,m_t)
Out[8]:
array([[ 1. , -0.5, 1. , -0.5],
[-0.5, 1. , -0.5, 1. ],
[ 1. , -0.5, 1. , -0.5],
[-0.5, 1. , -0.5, 1. ]])
In [9]: scipy.corrcoef(m,m)
Out[9]:
array([[ 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1.]])
In [10]: scipy.corrcoef(m_t,m_t)
Out[10]:
array([[ 1. , -0.5, 1. , -0.5],
[-0.5, 1. , -0.5, 1. ],
[ 1. , -0.5, 1. , -0.5],
[-0.5, 1. , -0.5, 1. ]])
In [11]: import scipy.stats
In [12]: scipy.stats.corrcoef(m,m)
[ various error messages, culminating in...]
ValueError: objects are not aligned
In [13]: scipy.stats.corrcoef(m_t,m_t)
[ various error messages, culminating in...]
ValueError: objects are not aligned
In [14]: scipy.stats.pearsonr(m,m)
[ various error messages, culminating in...]
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()
In [15]: scipy.stats.pearsonr(m_t,m_t)
[ various error messages, culminating in...]
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()
More information about the SciPy-User
mailing list