Lists: Converting Double to Single

Fri Mar 2 04:22:50 EST 2007

Jussi Salmela <tiedon_jano at hotmail.com> wrote:

> I've run a couple of tests and it seems to me that Dennis Lee Bieber 
is 
>   on the trail of the truth when he claims that smallest magnitude to 
> the largest is the way to do the summation. Actually it isn't THE way 
> although it diminishes the error. I was sort of astonished because I 
> also had somewhere along the years formed the same notion as Dennis.
> 
> "Your" pairwise method beats the former method by a large margin 
> although I don't understand why. To tell you the truth: I started out 
to 
> show you were wrong because intuitively (for me at least) the former 
> method should work better than yours.

I think Dennis is spot on when he says that smallest to largest is the 
best way. What he has missed though is that if you have a list of 
similar sized values, then after you have added the first two together 
the next addition still needs to follow the smallest to largest rule so 
it shouldn't include the result of the first addition. One option, if 
you have access to the full list when you start, would be to reinsert 
each partial sum back into the sorted list as you calculate it: a heap 
queue might be an appropriate implementation for that.

The pairwise sum gives you most, although not quite all, of the benefit 
of sorting the data: obviously if the input values are identical it is 
almost exactly equivalent to adding the smallest values and then 
inserting the partial result back into the list. Even in the 1000000, 
0.00001 case it avoids most of the precision-losing additions such as 
500000000005.0 + 0.00001 which the simple sum hits.

Also it has the further advantages of still working with other summable 
objects (e.g. lists) so long as they are associative and not requiring 
you to know all the values or even the length of the sequence in 
advance.

For floating point you can of course use methods to preserve the lost 
bits such as the kahan sum, or even just do all imtermediate 
calculations to a higher internal precision.

By the way, I found another bug in my sumpairs: += is a very bad idea 
here if the input can contain mutables, so the relevant line should be:

            tmp[-1] = tmp[-1] + v

Once I fixed that one, so it no longer attempts to modify lists in-
place, it appears that the more balanced intermediate results give a 
benefit when summing lists as well. sumpairs on a list of lists seems to 
beat the builtin sum as soon as the length of the list exceeds about 210 
items. At 1000 items it is about 3 times faster and 30 times faster at 
10,000 items. If I get the time I think I really should code up a C 
version of sumpairs and see how that compares against Pythons builtin 
sum for numerical sums.

I guess either everyone else missed that computer science lecture I 
thought I remembered, or maybe I was the one who was asleep and I 
dreamed it all. :)