[SciPy-Dev] conv2 and normxcorr2

Frédéric Bastien nouiz at nouiz.org
Mon May 12 11:34:44 EDT 2014


Just a follow op. There is a new paper on that subject. They found that on
the GPU, for the big number of convolution that we need for convolutional
neural network, it is worthwhile to do the conversion to the FFT:

http://arxiv.org/abs/1312.5851

Fred


On Tue, Jan 7, 2014 at 9:03 AM, Frédéric Bastien <nouiz at nouiz.org> wrote:

> Using the FFT version need a conversion to from the FFT space. This
> take time. This make that using the FFT version is useful only for big
> convolution. For small convolution, the direct version is faster. But
> I never timed it and so I can't give any idea of what is small and
> big. This also depend of how efficient both version are. Comparing the
> current slow version in SciPy vs the FFT would make the FFT
> practically always faster.
>
>
> If someone have the time to compare the FFT version vs the Theano
> implementation, we could have an idea of the size for each case.
>
> Fred
>
> On Tue, Jan 7, 2014 at 12:32 AM, Aaron Webster <awebster at falsecolour.com>
> wrote:
> > Do you know the reason why the convolutions here aren't computed using
> it's
> > Fourier transform property?  It seems like this is an obvious path to
> take
> > advantage of existing code and speed.
> >
> >
> > On Mon, Jan 6, 2014 at 10:45 PM, Frédéric Bastien <nouiz at nouiz.org>
> wrote:
> >>
> >> Hi,
> >>
> >> I implemented faster CPU convolution in Theano. The code isn't easily
> >> portable... So here is the optimization that I recall where important.
> >>
> >> 1) In the inner loop, there is an if. This is very bad speed wise. We
> >> can replace the inner loop with if with 3 consecutive inner loop. One
> >> for before the image (when we pad with 0 or something else). The
> >> second for when we are in the image and the last for after. In the
> >> valid mode, the first and last loop will be empty.
> >>
> >> 2) Don't copy data! This is very slow and not needed in many cases.
> >>
> >> 3) Don't use a jump table to function call to just do an
> >> multiplication. This is used to make it work for all dtype. It need to
> >> have different code path for each dtype. Doing a pseudo-function call
> >> to just do a multiplication is very slow.
> >>
> >> 4) do some type of unrolling
> >>
> >> If someone want to see the part of Theano code that could be the more
> >> portable is this:
> >>
> >>
> >>
> https://github.com/Theano/Theano/blob/master/theano/tensor/nnet/conv.py#L2167
> >>
> >> It do those 4 optimizations and I point to just the code that do the
> >> computation. So this should be redable by people knowing numpy c-api.
> >> We add a paper that compare this implementation to the scipy one. From
> >> memory it was 100x faster... but for neural network, as it also
> >> generalize this code to do more then one convolution per call. That is
> >> why there is other loop before line 2167. Also, the parallel version
> >> don't always speed up, so I disabled it by default in Theano. It need
> >> test to disable it when the shape are too small.
> >>
> >> If someone look at this and have questions, I can answer them.
> >>
> >> HTH
> >>
> >> Fred
> >>
> >>
> >>
> >> On Tue, Dec 31, 2013 at 10:50 AM, Ralf Gommers <ralf.gommers at gmail.com>
> >> wrote:
> >> >
> >> >
> >> >
> >> > On Tue, Dec 31, 2013 at 4:07 PM, Luke Pfister <luke.pfister at gmail.com
> >
> >> > wrote:
> >> >>
> >> >> I *believe* that Matlab is calling either the Intel MKL or Intel IPP
> >> >> convolution routines, which is why they are so much faster.
> >> >>
> >> >> I ran into a situation where I needed to perform many, many small 2D
> >> >> convolutions, and wound up writing a Cython wrapper to call the IPP
> >> >> convolution.  I seem to remember getting speedups of ~200x when
> >> >> convolving an 8x8 kernel with a 512x512 image.
> >> >>
> >> >> I'm not familiar with how the Scipy convolution functions are
> >> >> implemented under the hood.  Do they use efficient algorithms for
> >> >> small convolution sizes (ie, overlap-add, overlap-save)?
> >> >
> >> >
> >> > It looks like the implementation is very straightforward and could
> >> > benefit
> >> > from some optimization:
> >> > Convolve2d:
> >> >
> >> >
> >> >
> https://github.com/scipy/scipy/blob/master/scipy/signal/sigtoolsmodule.c#L1006
> >> >
> >> >
> https://github.com/scipy/scipy/blob/master/scipy/signal/firfilter.c#L84
> >> > And correlate2d just calls convolve2d:
> >> >
> >> >
> >> >
> https://github.com/scipy/scipy/blob/master/scipy/signal/signaltools.py#L503
> >> >
> >> >
> >> >>
> >> >>
> >> >> --
> >> >> Luke
> >> >>
> >> >> On Tue, Dec 31, 2013 at 8:49 AM, Aaron Webster
> >> >> <awebster at falsecolour.com>
> >> >> wrote:
> >> >> > On Tue, Dec 31, 2013 at 2:42 PM, Ralf Gommers
> >> >> > <ralf.gommers at gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Dec 31, 2013 at 1:43 PM, awebster at falsecolour.com
> >> >> >> <awebster at falsecolour.com> wrote:
> >> >> >>> I noticed a couple of popular matlab functions - conv2 and
> >> >> >>> normxcorr2 were not present in the scipy.signal packages.  I
> would
> >> >> >>> like to submit them for addition.  Can anyone point me on
> >> >> >>> instructions on how to write such a thing?  Below are examples.
> >> >> >>>
> >> >> >>
> >> >> >> Hi Aaron, isn't conv2 the same as signal.convolve2d? And can what
> >> >> >> normxcorr2 does be done with signal.correlate2d?
> >> >> >>
> >> >> > I did a quick test and it seems that you are correct:
> >> >> > signal.convolve2d
> >> >> > appears to generate basically the same output as conv2, and
> following
> >> >> > normxcorr2 can be done with signal.correlate2d.  However, I noticed
> >> >> > while doing this that both signal.convolve2d and signal.correlate2d
> >> >> > are
> >> >> > *extremely* slow.  For example, on my computer with a random
> 100x100
> >> >> > matrix signal.correlate2d takes 4.73 seconds while normxcorr2 take
> >> >> > 0.253 seconds.  The results are similar for signal.convolve2d and
> >> >> > conv2.
> >> >> >
> >> >> > As a practical matter, would it make most sense to fix
> >> >> > signal.correlate2d and signal.convolve2d, or implement new
> functions?
> >> >
> >> >
> >> > Speeding up the existing function would be preferable. firfilter.c
> >> > already
> >> > contains a suggestion on how to do that.
> >> >
> >> > Ralf
> >> >
> >> >
> >> > _______________________________________________
> >> > SciPy-Dev mailing list
> >> > SciPy-Dev at scipy.org
> >> > http://mail.scipy.org/mailman/listinfo/scipy-dev
> >> >
> >> _______________________________________________
> >> SciPy-Dev mailing list
> >> SciPy-Dev at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-dev
> >
> >
> >
> >
> > --
> > Aaron Webster
> >
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-dev
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140512/cf37327b/attachment.html>


More information about the SciPy-Dev mailing list