[SciPy-User] Q: How to calculate correlation between columns of a matrix, without looping?

Raj rajeev.raizada at gmail.com
Tue Mar 1 14:50:29 EST 2011


Dear SciPy users,

I have a matrix (or, more strictly speaking, an array),
and I want to calculate the correlation between each column
and every other column, i.e. to make a standard correlation matrix.

In Matlab, this is pretty straightforward:
>> m = [ 1 2; -1 3; 0 4]
m =
    1     2
   -1     3
    0     4

>> corr(m,m)
ans =
   1.0000   -0.5000
  -0.5000    1.0000

However, getting this same behaviour out of SciPy/NumPy
is proving to be harder than I expected.

Below are some attempts, and the output that they give.
I also show the results of trying correlations on the transpose of m.
The closest to the desired output that I can get
is a weird 4x4 matrix made out of stacked copies
of the correct correlation matrix.

I could loop through the columns of the matrix,
and calculate each correlation separately,
but that seems like an ugly and inefficient workaround.
Any help or advice greatly appreciated,

Raj
-----------------
In [1]: import scipy
In [2]: import numpy
In [3]: m = scipy.array([[ 1, 2],[ -1, 3],[ 0, 4]])
In [4]: m
Out[4]:
array([[ 1,  2],
      [-1,  3],
      [ 0,  4]])

In [5]: numpy.corrcoef(m,m)
Out[5]:
array([[ 1.,  1.,  1.,  1.,  1.,  1.],
      [ 1.,  1.,  1.,  1.,  1.,  1.],
      [ 1.,  1.,  1.,  1.,  1.,  1.],
      [ 1.,  1.,  1.,  1.,  1.,  1.],
      [ 1.,  1.,  1.,  1.,  1.,  1.],
      [ 1.,  1.,  1.,  1.,  1.,  1.]])

In [6]: m_t = scipy.transpose(m)

In [7]: m_t
Out[7]:
array([[ 1, -1,  0],
      [ 2,  3,  4]])

In [8]: numpy.corrcoef(m_t,m_t)
Out[8]:
array([[ 1. , -0.5,  1. , -0.5],
      [-0.5,  1. , -0.5,  1. ],
      [ 1. , -0.5,  1. , -0.5],
      [-0.5,  1. , -0.5,  1. ]])

In [9]: scipy.corrcoef(m,m)
Out[9]:
array([[ 1.,  1.,  1.,  1.,  1.,  1.],
      [ 1.,  1.,  1.,  1.,  1.,  1.],
      [ 1.,  1.,  1.,  1.,  1.,  1.],
      [ 1.,  1.,  1.,  1.,  1.,  1.],
      [ 1.,  1.,  1.,  1.,  1.,  1.],
      [ 1.,  1.,  1.,  1.,  1.,  1.]])

In [10]: scipy.corrcoef(m_t,m_t)
Out[10]:
array([[ 1. , -0.5,  1. , -0.5],
      [-0.5,  1. , -0.5,  1. ],
      [ 1. , -0.5,  1. , -0.5],
      [-0.5,  1. , -0.5,  1. ]])

In [11]: import scipy.stats

In [12]: scipy.stats.corrcoef(m,m)
[ various error messages, culminating in...]
ValueError: objects are not aligned

In [13]: scipy.stats.corrcoef(m_t,m_t)
[ various error messages, culminating in...]
ValueError: objects are not aligned

In [14]: scipy.stats.pearsonr(m,m)
[ various error messages, culminating in...]
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()

In [15]: scipy.stats.pearsonr(m_t,m_t)
[ various error messages, culminating in...]
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()



More information about the SciPy-User mailing list