[Numpy-discussion] Correlation filter

Keith Goodman kwgoodman at gmail.com
Fri Nov 20 13:51:01 EST 2009


On Fri, Nov 20, 2009 at 8:53 AM,  <josef.pktd at gmail.com> wrote:
> scipy.signal.correlate  would be fast, but it will not be easy to
> subtract the correct moving mean. Subtracting a standard moving mean
> would subtract different values for each observation in the window.
>
> One possibility would be to look at a moving regression and just take
> the estimated slope parameter. John D'Errico
> http://www.mathworks.com/matlabcentral/fileexchange/16997-movingslope
> (BSD licensed)
> uses a very nice trick with pinv to get the filter to calculate a
> moving slope coefficient. I read the source but didn't try it out and
> didn't try to figure out how the pinv trick exactly works.
> If this can be adapted to your case, then this would be the fastest I
> can think of (pinv and scipy.signal.correlate would do everything in
> C, or maybe one (500) loop might still be necessary)
>
> For just getting a ranking on a linear relationship, there might be
> other tricks possible, local linear regression, ... (?), but I never
> tried. Also, I think with one time loop, you can do all cross section
> regressions at the same time, without tricks.

Those sound like good ideas.

I was able to get rid of the for loop (and cut the time in half) by
using lfilter from scipy:

def corr3(x, y):
    x = x - x.mean()
    x /= x.std()
    nx = x.size
    one = np.ones(nx)
    xy = lfilter(x, 1, y)
    sy = lfilter(one, 1, y)
    sy2 = lfilter(one, 1, y*y)
    d = xy / np.sqrt(nx * sy2 - sy * sy)
    return d

(Somehow I managed to flip the sign of the correlation in corr3.)

I can trim a little more time by doing some of the operations in place
but then the code becomes hard to read. This solution will work well
for me since I can move much of the calculation outside of the
function and reuse it across many calls (y does not change often
compared to x).



More information about the NumPy-Discussion mailing list