Feature suggestion: sum() ought to use a compensated summation algorithm

Nick Craig-Wood nick at craig-wood.com
Sat May 3 14:30:03 EDT 2008


Szabolcs Horvát <szhorvat at gmail.com> wrote:
>  I did the following calculation:  Generated a list of a million random 
>  numbers between 0 and 1, constructed a new list by subtracting the mean 
>  value from each number, and then calculated the mean again.
> 
>  The result should be 0, but of course it will differ from 0 slightly 
>  because of rounding errors.
> 
>  However, I noticed that the simple Python program below gives a result 
>  of ~ 10^-14, while an equivalent Mathematica program (also using double 
>  precision) gives a result of ~ 10^-17, i.e. three orders of magnitude 
>  more precise.
> 
>  Here's the program (pardon my style, I'm a newbie/occasional user):
> 
>  from random import random
> 
>  data = [random() for x in xrange(1000000)]
> 
>  mean = sum(data)/len(data)
>  print sum(x - mean for x in data)/len(data)
> 
>  A little research shows that Mathematica uses a "compensated summation" 
>  algorithm.  Indeed, using the algorithm described at
>  http://en.wikipedia.org/wiki/Kahan_summation_algorithm
>  gives us a result around ~ 10^-17:

I was taught in my numerical methods course (about 20 years ago now!)
that the way to do this sum with most accuracy is to sum from the
smallest magnitude to the largest magnitude.

Eg

  >>> from random import random
  >>> data = [random() for x in xrange(1000000)]
  >>> mean = sum(data)/len(data)
  >>> print sum(x - mean for x in data)/len(data)
  1.64152134108e-14
  >>> mean = sum(sorted(data))/len(data)
  >>> print sum(x - mean for x in data)/len(data)
  5.86725534824e-15
  >>>

It isn't as good as the compensated sum but it is easy!

>  I thought that it would be very nice if the built-in sum() function used 
>  this algorithm by default.  Has this been brought up before?  Would this 
>  have any disadvantages (apart from a slight performance impact, but 
>  Python is a high-level language anyway ...)?

sum() gets used for any numerical types not just floats...

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list