[SciPy-dev] fft benchmarks, chunked and buffered

Sun May 18 19:53:11 EDT 2003

Hi Pearu,

I went back to the basic rearanged radix 2 fft
and started fooling around with data chunking and
buffering. As a side benifit, the basic routines are now strided, so
multidimensional transforms should be easy.

The cache worked a bit differently than I thought,
and I still don't understand it. However, here are
some results that show improvements are certainly
possible:

      Fast Fourier Transform
================================
           |    complex input  |
--------------------------------
 size      |  scipy  | testing |
--------------------------------
       256 |    2.32 |    2.04  (10000 calls)
       512 |    1.57 |    1.50  (5000 calls)
      1024 |    1.20 |    1.27  (2000 calls)
      2048 |    1.28 |    1.15  (1000 calls)
      4096 |    1.58 |    1.29  (500 calls)
      8192 |    3.39 |    2.24  (200 calls)
     16384 |    5.03 |    3.14  (100 calls)
     32768 |    4.64 |    3.58  (50 calls)
     65536 |    4.75 |    3.71  (20 calls)
    131072 |    5.24 |    3.43  (10 calls)
    262144 |    5.20 |    5.44  (5 calls)
    524288 |    5.43 |    5.21  (2 calls)
   1048576 |    6.12 |    5.37  (1 calls)
.
----------------------------------------------------------------------
Ran 1 tests in 43.943s

Still haven't tried it agaist FFTW, not ready to risk
getting blown away just yet ;)

The code is getting messy, although it produces correct
results. I think I'll put it aside for a bit and do some 
thinking.

Chuck