[Numpy-discussion] Correlation filter
Keith Goodman
kwgoodman at gmail.com
Fri Nov 20 13:51:01 EST 2009
On Fri, Nov 20, 2009 at 8:53 AM, <josef.pktd at gmail.com> wrote:
> scipy.signal.correlate would be fast, but it will not be easy to
> subtract the correct moving mean. Subtracting a standard moving mean
> would subtract different values for each observation in the window.
>
> One possibility would be to look at a moving regression and just take
> the estimated slope parameter. John D'Errico
> http://www.mathworks.com/matlabcentral/fileexchange/16997-movingslope
> (BSD licensed)
> uses a very nice trick with pinv to get the filter to calculate a
> moving slope coefficient. I read the source but didn't try it out and
> didn't try to figure out how the pinv trick exactly works.
> If this can be adapted to your case, then this would be the fastest I
> can think of (pinv and scipy.signal.correlate would do everything in
> C, or maybe one (500) loop might still be necessary)
>
> For just getting a ranking on a linear relationship, there might be
> other tricks possible, local linear regression, ... (?), but I never
> tried. Also, I think with one time loop, you can do all cross section
> regressions at the same time, without tricks.
Those sound like good ideas.
I was able to get rid of the for loop (and cut the time in half) by
using lfilter from scipy:
def corr3(x, y):
x = x - x.mean()
x /= x.std()
nx = x.size
one = np.ones(nx)
xy = lfilter(x, 1, y)
sy = lfilter(one, 1, y)
sy2 = lfilter(one, 1, y*y)
d = xy / np.sqrt(nx * sy2 - sy * sy)
return d
(Somehow I managed to flip the sign of the correlation in corr3.)
I can trim a little more time by doing some of the operations in place
but then the code becomes hard to read. This solution will work well
for me since I can move much of the calculation outside of the
function and reuse it across many calls (y does not change often
compared to x).
More information about the NumPy-Discussion
mailing list