[Numpy-discussion] corrcoef of masked array
Jesper Larsen
jl at dmi.dk
Wed May 30 06:02:14 EDT 2007
On Friday 25 May 2007 19:18, Robert Kern wrote:
> Jesper Larsen wrote:
> > Hi numpy users,
> >
> > I have a masked array of dimension (nvariables, nobservations) that
> > contain missing values at arbitrary points. Is it safe to rely on
> > numpy.corrcoeff to calculate the correlation coefficients of a masked
> > array (it seems to give reasonable results)?
>
> No, it isn't. There are several different options for estimating
> correlations in the face of missing data, none of which are clearly
> superior to the others. None of them are trivial. None of them are
> implemented in numpy.
Thanks, my previous post was sent a bit too early since it became clear to me
by reading the code for corrcoef that it is not safe for use with masked
arrays.
Here is my solution for calculating the correlation coefficients for masked
arrays. Comments are appreciated:
def macorrcoef(data1, data2):
"""
Calculates correlation coefficients taking masked out values
into account.
It is assumed (but not checked) that data1.shape == data2.shape.
"""
nv, no = data1.shape
cc = ma.array(0., mask=ones((nv, nv)))
if no > 1:
for i in range(nv):
for j in range(nv):
m = ma.getmaskarray(data1[i,:]) | ma.getmaskarray(data2[j,:])
d1 = ma.array(data1[i,:], copy=False, mask=m).compressed()
d2 = ma.array(data2[j,:], copy=False, mask=m).compressed()
if ma.count(d1) > 1:
c = corrcoef(d1, d2)
cc[i,j] = c[0,1]
return cc
- Jesper
More information about the NumPy-Discussion
mailing list