[SciPy-dev] [SciPy-user] Benchmark data

Fri Dec 9 15:01:41 EST 2005

Gerard Vermeulen wrote:

>On Fri, 09 Dec 2005 03:14:49 -0700
>Travis Oliphant <oliphant.travis at ieee.org> wrote:
>  
>
>>I'd like people to try out scipy core in SVN.  I made improvements to the
>>buffered ufunc section of code that I think will make a big difference
>>in the recently published benchmarks. 
>>
>>    
>>
>
>Hi Travis,
>
>indeed, it made a big difference (for big arrays scipy is now fastest on some
>statements).
>
>Below are my benchmark results on my DIY python, see
>http://www.scipy.org/mailinglists/mailman?fn=scipy-user/2005-December/006057.html
>
>On my system and for large arrays (>4096), numarray is still fastest, scipy moved
>to second and Numeric is third.
>Numeric is still fastest for small arrays, scipy is second, numarray is third.
>  
>
Numeric will always be faster for small-enough arrays, I think, because 
it doesn't have the ufunc overhead.   I just don't want it to be a lot 
faster.   We can improve the limiting scalar case in scipy_core using 
separate scalar math.  It looks like we are doing reasonably well.

>Invoking: python bench.py 12
>Importing test to scipy
>Importing base to scipy
>Importing basic to scipy
>Python 2.4.2 (#1, Dec  4 2005, 08:21:04) 
>[GCC 3.4.3 (Mandrakelinux 10.2 3.4.3-7mdk)]
>Optimization flags: -DNDEBUG -O3 -march=i686
>CPU info: getNCPUs=2 has_mmx has_sse has_sse2 is_32bit is_Intel is_Pentium is_PentiumIV
>Numeric-24.2
>numarray-1.5.0
>scipy-core-0.8.1.1617
>benchmark size = 12  (vectors of length 16777216)
>label            Numeric       numarray     scipy.base
>    1             0.4127        0.07423         0.3927
>    2             0.2734         0.2321         0.3234
>    3             0.1975         0.1821         0.2733
>    4             0.8747         0.5371         0.5588
>    5             0.2896         0.2342         0.2737
>    6             0.2066         0.1731         0.2718
>    7             0.8761         0.6286         0.5524
>    8             0.6546         0.4556         0.4533
>    9              9.488          7.566          8.717
>   10              9.506          8.064          8.745
>   11              7.879          6.301          7.305
>TOTAL              30.66          24.45          27.87
>  
>

As mentioned before, it looks like the optimizer is doing something nice 
on your system.   One issue is arange which could definitely be made 
faster by having different "fillers" for different types.   I'm still 
astonished by the markedly different numbers you seem to get than others 
have shown.  Is this all -O3 optimization kicking in?

The other issue is the sin and cosine functions.  They don't have their 
own inner loops.  They call a generic inner loop with a 
"function-pointer" data.    Perhaps the optimizer can't do as much with 
that or it needs to written with an optimizer in mind.

Ultimately, though, I'd like to see some of the inner loops to take 
advantage of SSE (and equivalent) instructions if the number of 
iterations is large-enough.    So, yes, I think we could get faster.  
But, I'd first like to get more data from more machines and compiler 
flags to determine where the slowness is really coming from.   It might 
be good, for example, to break up one of lines 9, 10, and 11 so that at 
least one sin and cos calculation is done alone.

Thanks,

-Travis