[SciPy-user] Benchmark data

Sat Dec 10 09:13:42 EST 2005

On Fri, 9 Dec 2005, Travis Oliphant wrote:

> I know people may be tired of the benchmark data,

In my opinion just the contrary - the more benchmark results
we can get for scipy, the better!

What about setting up a  `scipy.bench()`  suite
(similar to `scipy.test()` )?  They could be collected
as `bench_xxx.py` (and their level could determine how long the
are expected to run).

For FFT, scipy.test already includes a speed comparison,
but I would like to see more of this
(Esssentially for all subpackages linalg, sparse, integrate, interpolate
stats, random ....
and in general for ufunc operations, scalar operations etc.).
This would allow to compare the efficiency of different LAPACK
variants, compiler settings, different compilers etc. etc.

I think it would be great if users could contribute simple benchmark
examples of their area of interest.

> but I'm just trying to
> understand what kinds of techniques produce fast code.

I very much appreciate that you take this issue so serious!!

[...]

> So, I'm not sure how to reproduce what Gerard sees (except numarray's
> faster arange)
> which is a little perplexing.  I suppose that's why people criticize
> benchmarks so much.

In the end it only counts, how fast a given code is executed.
But there is no (easy?) way to find the optimal
compile flags - the space spanned by all possible options is
just too big. And varying the problem size can
lead to different optimal values...
The task is not made simpler by different compilers,
different CPUs/Cache sizes,
different distributions, different kernels, whatever else ...

So I would not be surprised at all if gcc 4.x or Intels icc
produce different results.

Just one remark on bench.py: it uses time.time().
So it does not determine the CPU time of a process.
This could be determined with jiffies
  from scipy.test.testing import jiffies
Another option might be timeit.py, see
  In [4]: import timeit
  In [5]: timeit?
(worth reading!)

Some more remarks on profiling, which might be useful to
other as well:
Python code can be profiled using the `hotshot` module.
Together with kcachegrind
  http://kcachegrind.sourceforge.net/
and Joerg Beyer's script `hotshot2cachegrind.py`
you can use a GUI to inspect the profiling results:

############
import hotshot

def run():
    # call your stuff here ....

prof = hotshot.Profile("pythongrind.prof", lineevents=1)
prof.runcall(run)
prof.close()
############

- this will generate the profiling results in "pythongrind.prof"
- then use
     hotshot2cachegrind -o cachegrind.out.42 pythongrind.prof
  to convert
- Start
    kcachegrind cachegrind.out.42
  which will give a nice graphical interface to the profiling data

Remark: kcachegrind can also be used to display profilings of gcc code
(However, I don't think it is possible to descend from
the python results to the C level of the called functions ...)

Note that this is particularly helpful to find and analyze
the main bottlenecks in a given code.
So in the above case, where the speed of single
operations is compared, I don't think it will help much.
(Also note, that it takes some time to understand the output;
a couple of things still look mysterious to me ;-)...

Thanks for all your effort!

Best, Arnd