[SciPy-Dev] conv2 and normxcorr2

Mon Jan 6 16:45:03 EST 2014

Hi,

I implemented faster CPU convolution in Theano. The code isn't easily
portable... So here is the optimization that I recall where important.

1) In the inner loop, there is an if. This is very bad speed wise. We
can replace the inner loop with if with 3 consecutive inner loop. One
for before the image (when we pad with 0 or something else). The
second for when we are in the image and the last for after. In the
valid mode, the first and last loop will be empty.

2) Don't copy data! This is very slow and not needed in many cases.

3) Don't use a jump table to function call to just do an
multiplication. This is used to make it work for all dtype. It need to
have different code path for each dtype. Doing a pseudo-function call
to just do a multiplication is very slow.

4) do some type of unrolling

If someone want to see the part of Theano code that could be the more
portable is this:

https://github.com/Theano/Theano/blob/master/theano/tensor/nnet/conv.py#L2167

It do those 4 optimizations and I point to just the code that do the
computation. So this should be redable by people knowing numpy c-api.
We add a paper that compare this implementation to the scipy one. From
memory it was 100x faster... but for neural network, as it also
generalize this code to do more then one convolution per call. That is
why there is other loop before line 2167. Also, the parallel version
don't always speed up, so I disabled it by default in Theano. It need
test to disable it when the shape are too small.

If someone look at this and have questions, I can answer them.

HTH

Fred

On Tue, Dec 31, 2013 at 10:50 AM, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>
>
> On Tue, Dec 31, 2013 at 4:07 PM, Luke Pfister <luke.pfister at gmail.com>
> wrote:
>>
>> I *believe* that Matlab is calling either the Intel MKL or Intel IPP
>> convolution routines, which is why they are so much faster.
>>
>> I ran into a situation where I needed to perform many, many small 2D
>> convolutions, and wound up writing a Cython wrapper to call the IPP
>> convolution.  I seem to remember getting speedups of ~200x when
>> convolving an 8x8 kernel with a 512x512 image.
>>
>> I'm not familiar with how the Scipy convolution functions are
>> implemented under the hood.  Do they use efficient algorithms for
>> small convolution sizes (ie, overlap-add, overlap-save)?
>
>
> It looks like the implementation is very straightforward and could benefit
> from some optimization:
> Convolve2d:
>
> https://github.com/scipy/scipy/blob/master/scipy/signal/sigtoolsmodule.c#L1006
>     https://github.com/scipy/scipy/blob/master/scipy/signal/firfilter.c#L84
> And correlate2d just calls convolve2d:
>
> https://github.com/scipy/scipy/blob/master/scipy/signal/signaltools.py#L503
>
>
>>
>>
>> --
>> Luke
>>
>> On Tue, Dec 31, 2013 at 8:49 AM, Aaron Webster <awebster at falsecolour.com>
>> wrote:
>> > On Tue, Dec 31, 2013 at 2:42 PM, Ralf Gommers <ralf.gommers at gmail.com>
>> > wrote:
>> >>
>> >>
>> >>
>> >> On Tue, Dec 31, 2013 at 1:43 PM, awebster at falsecolour.com
>> >> <awebster at falsecolour.com> wrote:
>> >>> I noticed a couple of popular matlab functions - conv2 and
>> >>> normxcorr2 were not present in the scipy.signal packages.  I would
>> >>> like to submit them for addition.  Can anyone point me on
>> >>> instructions on how to write such a thing?  Below are examples.
>> >>>
>> >>
>> >> Hi Aaron, isn't conv2 the same as signal.convolve2d? And can what
>> >> normxcorr2 does be done with signal.correlate2d?
>> >>
>> > I did a quick test and it seems that you are correct: signal.convolve2d
>> > appears to generate basically the same output as conv2, and following
>> > normxcorr2 can be done with signal.correlate2d.  However, I noticed
>> > while doing this that both signal.convolve2d and signal.correlate2d are
>> > *extremely* slow.  For example, on my computer with a random 100x100
>> > matrix signal.correlate2d takes 4.73 seconds while normxcorr2 take
>> > 0.253 seconds.  The results are similar for signal.convolve2d and conv2.
>> >
>> > As a practical matter, would it make most sense to fix
>> > signal.correlate2d and signal.convolve2d, or implement new functions?
>
>
> Speeding up the existing function would be preferable. firfilter.c already
> contains a suggestion on how to do that.
>
> Ralf
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>