[Numpy-discussion] Correlation filter

Fri Nov 20 10:51:10 EST 2009

I have a short 1d array x and a large 2d array y. I'd like to locate
the places in the y array that are most like (correlated to) the x
array.

My first attempt, corr1, is too slow. My second attempt, corr2, is
faster but still slow.

I reuse the same y many times, so my third attempt will probably be to
calculate the moving mean of y outside of the function. But before
doing that, I was wondering if there is any existing code that could
help me. It seems like this would be a common filter-type operation.
Or could stacking several filter operations like mean and product do
the trick?

I don't need the actual correlation. I just need an output that
preserves the ranking of the correlation. For benchmarking I am using
x of shape (5,) and y of shape (500,500):

x = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
y = np.random.randn(500, 500)

def corr1(x, y):
    d = np.nan * np.ones_like(y)
    for i in range(y.shape[0]):
        yi = y[i,:]
        for j in range(x.shape[0]-1, y.shape[1]):
            yj = yi[j+1-x.shape[0]:j+1]
            d[i,j] = np.corrcoef(x, yj)[0,1]
    return d

def corr2(x, y):
    dot = np.dot
    sqrt = np.sqrt
    d = np.nan * np.ones_like(y)
    x = x - x.mean()
    x /= x.std()
    x = np.tile(x, (y.shape[0],1))
    nx = x.shape[1]
    one = np.ones(nx) / nx
    for i in range(nx-1, y.shape[1]):
        yi = y[:,i+1-nx:i+1]
        yi = yi - dot(yi, one).reshape(-1,1)
        yi /= sqrt(dot(yi*yi, one)).reshape(-1,1)
        d[:,i] = dot(x * yi, one)
    return d

>> timeit corr1(x,y)
10 loops, best of 3: 13.3 s per loop
>> timeit corr2(x,y)
10 loops, best of 3: 60.5 ms per loop