Microbenchmark: Summing over array of doubles

Mon Aug 2 19:32:21 EDT 2004

Duncan Booth <me at privacy.net> wrote in message news:<Xns95395FD0EF85duncanrcpcouk at 127.0.0.1>...
> Christopher T King <squirrel at WPI.EDU> wrote in
> news:Pine.LNX.4.44.0408011840050.21160-100000 at ccc4.wpi.edu: 
> 
> > On 1 Aug 2004, Duncan Booth wrote:
> > 
> >> I just had a brief look at the code you posted. Are you not concerned
> >> about accuracy in any of your calculations? Summing a 10 million
> >> element array by simply accumulating each element into a running
> >> total is guaranteed to give a lousy result.
> > 
> > Lousy or not, I believe that's how numarray is implemented internally,
> > so at least all the benchmarks are the same.  If you want accuracy
> > summing that many numbers, you're going to have to do it in software
> > (likely by summing each mantissa bit individually and reconstructing
> > the float afterward), so it will be abysmally slow (obviously not what
> > the OP wants).
> > 
> 
> My point being that speed isn't everything. Most applications doing large 
> floating point calculations should be concerned about accuracy, and how not 
> to add up a large array of floating point numbers was one of the first 
> things I was taught in computer science classes. The fastest way to sum 10 
> million numbers (albeit at some considerable loss of accuracy):
> 
>     	return 0
> 
> 
> The 'correct' way to sum a large array of floats is, of course, to 
> calculate a lot of partial sums and sum them. For example, naively, we 
> might say that to sum the array we sum the first half, sum the second half 
> and add the results together. This definition being recursive ensures that 
> if the inputs are all of a similar size then all the intermediate 
> calculations involve summing similarly sized values. A more sophisticated 
> technique might also try to handle the case where not all values are a 
> similar size.
> 
> If you are worried about speed, then calculating partial sums is also the 
> way to go: the naive technique used by the original poster leaves no scope 
> for parallel calculation. It should be possible to write a slightly more 
> complex loop in the C version that runs a lot faster on systems that are 
> capable of performing multiple floating point instructions in parallel.

You are right, naive summing generates significant accuracy losses.
I estimated the error by summing [0.123456789012345]*1000000 and
comparing it to
1234567.89012345. All methods have error about 1e-4 . 

The method below sums the array at the same speed as regular Python
sum loop, but reports error < 1e-15 .

def sum2(arr):
  size = len(arr)
  for itr in range(int(ceil(log(size)/log(2)))):
    for i in xrange(0, size, 2**(itr+1)):
      next_i = i+2**itr
      if next_i<size:
        arr[i]+=arr[next_i]
  return arr[0]

When arbitrary precision numbers are being used, this method has
performance O(nd) vs. regular summing O(n(d+log d)) where n is size of
array, d is number of digits of the elements. In practice I got about
20% performance increase when summing an array of 1000 Decimals using
method above.

A better estimate of error difference would use random digits as
opposed to [x]*10000000, but I don't know how I would calculate exact
answer in any reasonable amount of time. (using Decimals takes over a
second for 4000 array (with conversion))

Yaroslav