Feature suggestion: sum() ought to use a compensated summation algorithm

Szabolcs Horvát szhorvat at gmail.com
Sat May 3 12:50:34 EDT 2008


I did the following calculation:  Generated a list of a million random 
numbers between 0 and 1, constructed a new list by subtracting the mean 
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly 
because of rounding errors.

However, I noticed that the simple Python program below gives a result 
of ~ 10^-14, while an equivalent Mathematica program (also using double 
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude 
more precise.

Here's the program (pardon my style, I'm a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensated summation" 
algorithm.  Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:

def compSum(arr):
     s = 0.0
     c = 0.0
     for x in arr:
         y = x-c
         t = s+y
         c = (t-s) - y
         s = t
     return s

mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)


I thought that it would be very nice if the built-in sum() function used 
this algorithm by default.  Has this been brought up before?  Would this 
have any disadvantages (apart from a slight performance impact, but 
Python is a high-level language anyway ...)?

Szabolcs Horvát



More information about the Python-list mailing list