[Numpy-discussion] Reducing the rounding error of np.sum

Thu Jul 4 15:33:43 EDT 2013

On Thu, Jul 4, 2013 at 12:15 PM, Julian Taylor <
jtaylor.debian at googlemail.com> wrote:

> hi,
> numpys implementation of sum is just a simple:
> for i in d:
>    sum += d[i]
>
> this suffers from rather high rounding errors in the order of the d.size
> * epsilon. Consider:
> (np.ones(50000) / 10.).sum()
> 5000.0000000006585
>
> There are numerous algorithms which reduce the error of this operation.
> E.g. python implements one in math.fsum which is accurate but slow
> compared to np.sum [0].
>
> Numpy is currently lacking a precise summation, but I think it would
> make sense to add one.
> The question is whether we go with the python approach of adding a new
> function which is slower but more precise or if we even change the
> default summation algorithm (or do nothing :) ).
>
> For a new function I guess the method used in python itself makes sense,
> its probably well chosen by the python developers (though I did not
> lookup the rational for the choice yet).
>
> For replacing the default, two algorithms come to my mind, pairwise
> summation [1] and kahan summation (compensated sum) [2].
> pairwise summation adds in pairs so usually the magnitude of the two
> operands is the same magnitude, this produces an error of O(log n *
> epsilon) for the common case.
> This algorithm has the advantage that it is almost as fast as the naive
> sum with an reasonable error.
> Problematic might be the buffering numpy does when reducing, this would
> limit the error reduction to the buffer size.
>
> kahan summation adds some extra operations to recover the rounding
> errors. This results in an error of o(epsilon).
> It is four times slower than the naive summation but this can be
> (partially) compensated by vectorizing it.
> It has the advantage of lower error, simpler implementation and
> buffering does not interfere.
>
> I did prototype implementations (only unit strides) of both in [3, 4].
>
> Any thoughts on this?
>
>
> [0]
>
> http://docs.python.org/2/library/math.html#number-theoretic-and-representation-functions
> [1] http://en.wikipedia.org/wiki/Pairwise_summationg
> [2] http://en.wikipedia.org/wiki/Kahan_summation_algorithm
> [3] https://github.com/juliantaylor/numpy/tree/pairwise
> [4] https://github.com/juliantaylor/numpy/tree/kahan
>

I think this would be useful as part of a bigger package that included
accurate mean, var, and std. In particular, the need for more accurate
means and variances have been discussed on the list before, but no one has
stepped forward to do anything about them.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130704/a8aaee8a/attachment.html>