Microbenchmark: Summing over array of doubles

Mon Aug 2 05:11:48 EDT 2004

Christopher T King <squirrel at WPI.EDU> wrote in
news:Pine.LNX.4.44.0408011840050.21160-100000 at ccc4.wpi.edu: 

> On 1 Aug 2004, Duncan Booth wrote:
> 
>> I just had a brief look at the code you posted. Are you not concerned
>> about accuracy in any of your calculations? Summing a 10 million
>> element array by simply accumulating each element into a running
>> total is guaranteed to give a lousy result.
> 
> Lousy or not, I believe that's how numarray is implemented internally,
> so at least all the benchmarks are the same.  If you want accuracy
> summing that many numbers, you're going to have to do it in software
> (likely by summing each mantissa bit individually and reconstructing
> the float afterward), so it will be abysmally slow (obviously not what
> the OP wants).
> 

My point being that speed isn't everything. Most applications doing large 
floating point calculations should be concerned about accuracy, and how not 
to add up a large array of floating point numbers was one of the first 
things I was taught in computer science classes. The fastest way to sum 10 
million numbers (albeit at some considerable loss of accuracy):

    	return 0

The 'correct' way to sum a large array of floats is, of course, to 
calculate a lot of partial sums and sum them. For example, naively, we 
might say that to sum the array we sum the first half, sum the second half 
and add the results together. This definition being recursive ensures that 
if the inputs are all of a similar size then all the intermediate 
calculations involve summing similarly sized values. A more sophisticated 
technique might also try to handle the case where not all values are a 
similar size.

If you are worried about speed, then calculating partial sums is also the 
way to go: the naive technique used by the original poster leaves no scope 
for parallel calculation. It should be possible to write a slightly more 
complex loop in the C version that runs a lot faster on systems that are 
capable of performing multiple floating point instructions in parallel.