Microbenchmark: Summing over array of doubles

Mon Aug 2 19:51:46 EDT 2004

Christopher T King <squirrel at WPI.EDU> wrote in message news:<Pine.LNX.4.44.0408011847510.21160-100000 at ccc4.wpi.edu>...
> On 31 Jul 2004, Yaroslav Bulatov wrote:
> 
> > I'm doing intensive computation on arrays in Python, so if you have
> > suggestions on Python/C solutions that could push the envelope, please
> > let me know.
> 
> If you're doing mostly vector calculations as opposed to summing, I've
> been doing some work on adding SIMD support to numarray, with pleasing
> results (around 2x speedups).  I've also done some work adding local
> parallel processing support to numarray, with not-so-pleasing results
> (mostly due to Python overhead).
> 
> Regarding your results:
> 
> numarray should be just as fast as the -O2 C version.  I was puzzled at
> first as to where the speed discrepancy came from, but the culprit is in
> the -O2 flag:  gcc -O2 noticies that sum is never used, and thus removes
> the loop entirely.  As a matter of fact, there isn't even any fadd 
> instruction in the assembler output:
> 
>         call    clock
>         movl    %eax, %esi
>         movl    $9999999, %ebx
> .L11:
>         decl    %ebx
>         jns     .L11
>         subl    $16, %esp
>         call    clock
> 
> As you can see, the 21ms you're seeing is the time spent counting down
> from 9,999,999 to 0.  To obtain correct results, add a line such as
> 'printf("%f\n",sum);' after the main loop in the C version.  This will
> force gcc to leave the actual calculation in place and give you accurate
> results.
> 
> The above fix will likely render numarray faster than the C version.  
> Using gcc -O3 rather than gcc -O2 will get fairer results, as this is what 
> numarray uses.

You are right, how silly of me! Fixing the script now results in 130
millis mean, 8.42 millis standard deviation, which is slower than
numarray (104, 2.6 respectively). I wonder why numarray gives faster
results on such a simple task?

> Is there any reason why in the Python/numarray version, you use 
> Numeric's RandomArray rather than numarray.random_array?  It shouldn't 
> affect your results, but it would speed up initialization time a bit.

There isn't a good reason, I simply didn't know about
numarray.random_array

> 
> There are a few inefficiences in the pytime module (mostly involving 
> range() and *args/**kwargs), but I don't think they'll have too big of an 
> impact on your results.  Instead, I'd suggest running the numarray/Numeric 
> tests using Psyco to remove much of the Python overhead.
> 
> For completeness, I'd also suggest both running the Java version using a 
> JIT compiler such as Kaffe, and compiling it natively using gcj (the 
> latter should approach the speed of C).