[SciPy-dev] fftw2 and fftw3

Arnd Baecker arnd.baecker at web.de
Thu Dec 15 09:18:27 EST 2005


Hi,

now also the results for the fftw2 vs. fftw3 comparison on the
Itanium 2 are there.

There is the same problem as on the other machines, for example:

fftw3
=====

                 Fast Fourier Transform
=================================================
      |    real input     |   complex input
-------------------------------------------------
 size |  scipy  | Numeric |  scipy  | Numeric
-------------------------------------------------
  100 |    1.23 |    1.59 |    9.59 |    1.56  (secs for 7000 calls)
 1000 |    1.01 |    3.08 |    9.38 |    2.98  (secs for 2000 calls)
  256 |    2.32 |    3.71 |   18.95 |    3.65  (secs for 10000 calls)
  512 |    3.39 |    8.25 |   26.69 |    8.08  (secs for 10000 calls)
 1024 |    0.55 |    1.45 |    4.34 |    1.42  (secs for 1000 calls)
 2048 |    0.96 |    3.19 |    7.57 |    3.15  (secs for 1000 calls)
 4096 |    0.90 |    3.05 |    7.13 |    3.01  (secs for 500 calls)
 8192 |    1.97 |    7.91 |   14.83 |    7.86  (secs for 500 calls)
....
    Multi-dimensional Fast Fourier Transform
===================================================
          |    real input     |   complex input
---------------------------------------------------
   size   |  scipy  | Numeric |  scipy  |  Numeric
---------------------------------------------------
  100x100 |    0.80 |    2.31 |    0.78 |    2.30  (secs for 100 calls)
 1000x100 |    0.52 |    2.08 |    0.56 |    2.08  (secs for 7 calls)
  256x256 |    0.67 |    1.59 |    0.64 |    1.59  (secs for 10 calls)
  512x512 |    1.63 |    3.45 |    1.59 |    3.49  (secs for 3 calls)
.....



fftw2
=====

                 Fast Fourier Transform
=================================================
      |    real input     |   complex input
-------------------------------------------------
 size |  scipy  | Numeric |  scipy  | Numeric
-------------------------------------------------
  100 |    1.23 |    1.58 |    1.14 |    1.56  (secs for 7000 calls)
 1000 |    1.02 |    3.03 |    0.80 |    3.00  (secs for 2000 calls)
  256 |    2.26 |    3.72 |    1.86 |    3.63  (secs for 10000 calls)
  512 |    3.07 |    8.24 |    2.35 |    8.09  (secs for 10000 calls)
 1024 |    0.51 |    1.44 |    0.41 |    1.41  (secs for 1000 calls)
 2048 |    0.89 |    3.19 |    0.70 |    3.14  (secs for 1000 calls)
 4096 |    0.83 |    3.05 |    0.68 |    3.02  (secs for 500 calls)
 8192 |    1.64 |    7.86 |    1.41 |    7.82  (secs for 500 calls)
....
    Multi-dimensional Fast Fourier Transform
===================================================
          |    real input     |   complex input
---------------------------------------------------
   size   |  scipy  | Numeric |  scipy  |  Numeric
---------------------------------------------------
  100x100 |    0.41 |    2.28 |    0.41 |    2.28  (secs for 100 calls)
 1000x100 |    0.37 |    2.06 |    0.40 |    2.08  (secs for 7 calls)
  256x256 |    0.38 |    1.60 |    0.37 |    1.61  (secs for 10 calls)
  512x512 |    0.91 |    3.59 |    1.10 |    3.53  (secs for 3 calls)
.....

So using fftw3 from scipy is drastically slower compared to fftw2,
in particular for complex arrays.

In order to be sure, that it is not fftw3 by itself which is slow,
I installed benchfft http://www.fftw.org/benchfft/
For size 8192, complex vectors, the results read:

cat fftw2/doitm_nd.double.speed | grep  8192 |grep dc

fftw2-nd-measure dcif 8192 2483.9 0.000214375 15.039
fftw2-nd-measure dcib 8192 2442.6 0.000218 15.0292
fftw2-nd-measure dcof 8192 3083.5 0.0001726875 14.5242
fftw2-nd-measure dcob 8192 3072.4 0.0001733125 14.7939

cat fftw3/doit.double.speed | grep  8192 |grep dc
fftw3 dcif 8192 3141.5 0.0001695 4.27479
fftw3 dcib 8192 3169.5 0.000168 4.27484
fftw3 dcof 8192 3297.1 0.0001615 2.31973
fftw3 dcob 8192 3359.5 0.0001585 2.26557

Remarks
 - i: in-place   (o: out-of-place)
 - c: complex    (r: real)
 - f: forward    (b: backwards fft)
 - The 4-th column is the ``mflops'' number.

This clearly shows that
fftw3 is faster than fftw2 (in particular for "i" - inplace situation)
(and not at a all a factor of 10 slower!).

Therefore the results for using fftw3 from python are completely puzzling.

I am at complete loss of what is going on as I don't understand
fftpack at all.
Could an fft expert have a look or give some guidance on
how to explore this further?


Best, Arnd









More information about the SciPy-Dev mailing list