[SciPy-User] moving window (2D) correlation coefficient

Zachary Pincus zachary.pincus at yale.edu
Wed Feb 10 15:02:22 EST 2010


> I tried that, but then less clever (just reshaping and transposing);  
> it
> made a copy, and I ended up with an array that took 5*5 times as much
> space as my initial array... And it took a lot of time.
> But I don't see how this reshape would gain me much speed?

Well, the feindishly-clever view (which might not even be possible, I  
haven't thought through it super-clearly) would at least use no more  
memory than the original array.

>  Then you could compute
>> the correlation coefficient between the two arrays directly. Maybe?
>
> Ah, like that. I thought numpy.corrcoef(x,y) only worked on flat x and
> flat y. There is no axis keyword... Is there another function that  
> would
> work on nd arrays?

I'd follow Joseph's lead here and implement your own corrcoef that (in  
the above case) works on (x,y,n,n)-shape arrays, or, in the cython  
case, is just a cdef function that computes it directly.

> Well, speed is more important than readability this time :-) It's
> terabytes of data I'll need to push through this function...

OK, then cython is likely preferable here. Just make sure to properly  
declare the array and the index variables, e.g.:
cdef np.ndarray[np.float32_t, ndim=2, negative_indices=False,  
mode='c'] a = np.asarray(a_in, dtype=np.float32, order='C')
(or could use fortran-order if that's what the arrays are natively)
and make sure after debugging to @cython.boundscheck(False) the  
function.

With these steps, your indexing will be maximally fast.

> What I understood till now (just starting to look at cython) is that  
> in
> this kind of thing, the repeated array sub-indexing (and its bounds
> checking etc.) is the main bottleneck. I presume the corrcoef function
> is pretty much optimized? So I thought if I could do the loop in  
> cython
> and implement some clever code for the window indexing (only replace  
> the
> new elements), I'd relatively simple get the major speed gain. Does  
> that
> sound correct?
> Only still have to find out now how to call the corrcoef function from
> cython...

Definitely implement your own corrcoef in cython in this case. Or just  
inline it in the inner loop of the function you're writing. The killer  
overhead will be in calling a python function on every pixel and  
converting the arguments and return value otherwise.

Zach






More information about the SciPy-User mailing list