[SciPy-dev] fftw2 and fftw3
Arnd Baecker
arnd.baecker at web.de
Thu Dec 15 09:18:27 EST 2005
Hi,
now also the results for the fftw2 vs. fftw3 comparison on the
Itanium 2 are there.
There is the same problem as on the other machines, for example:
fftw3
=====
Fast Fourier Transform
=================================================
| real input | complex input
-------------------------------------------------
size | scipy | Numeric | scipy | Numeric
-------------------------------------------------
100 | 1.23 | 1.59 | 9.59 | 1.56 (secs for 7000 calls)
1000 | 1.01 | 3.08 | 9.38 | 2.98 (secs for 2000 calls)
256 | 2.32 | 3.71 | 18.95 | 3.65 (secs for 10000 calls)
512 | 3.39 | 8.25 | 26.69 | 8.08 (secs for 10000 calls)
1024 | 0.55 | 1.45 | 4.34 | 1.42 (secs for 1000 calls)
2048 | 0.96 | 3.19 | 7.57 | 3.15 (secs for 1000 calls)
4096 | 0.90 | 3.05 | 7.13 | 3.01 (secs for 500 calls)
8192 | 1.97 | 7.91 | 14.83 | 7.86 (secs for 500 calls)
....
Multi-dimensional Fast Fourier Transform
===================================================
| real input | complex input
---------------------------------------------------
size | scipy | Numeric | scipy | Numeric
---------------------------------------------------
100x100 | 0.80 | 2.31 | 0.78 | 2.30 (secs for 100 calls)
1000x100 | 0.52 | 2.08 | 0.56 | 2.08 (secs for 7 calls)
256x256 | 0.67 | 1.59 | 0.64 | 1.59 (secs for 10 calls)
512x512 | 1.63 | 3.45 | 1.59 | 3.49 (secs for 3 calls)
.....
fftw2
=====
Fast Fourier Transform
=================================================
| real input | complex input
-------------------------------------------------
size | scipy | Numeric | scipy | Numeric
-------------------------------------------------
100 | 1.23 | 1.58 | 1.14 | 1.56 (secs for 7000 calls)
1000 | 1.02 | 3.03 | 0.80 | 3.00 (secs for 2000 calls)
256 | 2.26 | 3.72 | 1.86 | 3.63 (secs for 10000 calls)
512 | 3.07 | 8.24 | 2.35 | 8.09 (secs for 10000 calls)
1024 | 0.51 | 1.44 | 0.41 | 1.41 (secs for 1000 calls)
2048 | 0.89 | 3.19 | 0.70 | 3.14 (secs for 1000 calls)
4096 | 0.83 | 3.05 | 0.68 | 3.02 (secs for 500 calls)
8192 | 1.64 | 7.86 | 1.41 | 7.82 (secs for 500 calls)
....
Multi-dimensional Fast Fourier Transform
===================================================
| real input | complex input
---------------------------------------------------
size | scipy | Numeric | scipy | Numeric
---------------------------------------------------
100x100 | 0.41 | 2.28 | 0.41 | 2.28 (secs for 100 calls)
1000x100 | 0.37 | 2.06 | 0.40 | 2.08 (secs for 7 calls)
256x256 | 0.38 | 1.60 | 0.37 | 1.61 (secs for 10 calls)
512x512 | 0.91 | 3.59 | 1.10 | 3.53 (secs for 3 calls)
.....
So using fftw3 from scipy is drastically slower compared to fftw2,
in particular for complex arrays.
In order to be sure, that it is not fftw3 by itself which is slow,
I installed benchfft http://www.fftw.org/benchfft/
For size 8192, complex vectors, the results read:
cat fftw2/doitm_nd.double.speed | grep 8192 |grep dc
fftw2-nd-measure dcif 8192 2483.9 0.000214375 15.039
fftw2-nd-measure dcib 8192 2442.6 0.000218 15.0292
fftw2-nd-measure dcof 8192 3083.5 0.0001726875 14.5242
fftw2-nd-measure dcob 8192 3072.4 0.0001733125 14.7939
cat fftw3/doit.double.speed | grep 8192 |grep dc
fftw3 dcif 8192 3141.5 0.0001695 4.27479
fftw3 dcib 8192 3169.5 0.000168 4.27484
fftw3 dcof 8192 3297.1 0.0001615 2.31973
fftw3 dcob 8192 3359.5 0.0001585 2.26557
Remarks
- i: in-place (o: out-of-place)
- c: complex (r: real)
- f: forward (b: backwards fft)
- The 4-th column is the ``mflops'' number.
This clearly shows that
fftw3 is faster than fftw2 (in particular for "i" - inplace situation)
(and not at a all a factor of 10 slower!).
Therefore the results for using fftw3 from python are completely puzzling.
I am at complete loss of what is going on as I don't understand
fftpack at all.
Could an fft expert have a look or give some guidance on
how to explore this further?
Best, Arnd
More information about the SciPy-Dev
mailing list