[SciPy-User] moving window (2D) correlation coefficient
Zachary Pincus
zachary.pincus at yale.edu
Wed Feb 10 15:02:22 EST 2010
> I tried that, but then less clever (just reshaping and transposing);
> it
> made a copy, and I ended up with an array that took 5*5 times as much
> space as my initial array... And it took a lot of time.
> But I don't see how this reshape would gain me much speed?
Well, the feindishly-clever view (which might not even be possible, I
haven't thought through it super-clearly) would at least use no more
memory than the original array.
> Then you could compute
>> the correlation coefficient between the two arrays directly. Maybe?
>
> Ah, like that. I thought numpy.corrcoef(x,y) only worked on flat x and
> flat y. There is no axis keyword... Is there another function that
> would
> work on nd arrays?
I'd follow Joseph's lead here and implement your own corrcoef that (in
the above case) works on (x,y,n,n)-shape arrays, or, in the cython
case, is just a cdef function that computes it directly.
> Well, speed is more important than readability this time :-) It's
> terabytes of data I'll need to push through this function...
OK, then cython is likely preferable here. Just make sure to properly
declare the array and the index variables, e.g.:
cdef np.ndarray[np.float32_t, ndim=2, negative_indices=False,
mode='c'] a = np.asarray(a_in, dtype=np.float32, order='C')
(or could use fortran-order if that's what the arrays are natively)
and make sure after debugging to @cython.boundscheck(False) the
function.
With these steps, your indexing will be maximally fast.
> What I understood till now (just starting to look at cython) is that
> in
> this kind of thing, the repeated array sub-indexing (and its bounds
> checking etc.) is the main bottleneck. I presume the corrcoef function
> is pretty much optimized? So I thought if I could do the loop in
> cython
> and implement some clever code for the window indexing (only replace
> the
> new elements), I'd relatively simple get the major speed gain. Does
> that
> sound correct?
> Only still have to find out now how to call the corrcoef function from
> cython...
Definitely implement your own corrcoef in cython in this case. Or just
inline it in the inner loop of the function you're writing. The killer
overhead will be in calling a python function on every pixel and
converting the arguments and return value otherwise.
Zach
More information about the SciPy-User
mailing list