Lists: Converting Double to Single
Duncan Booth
duncan.booth at invalid.invalid
Fri Mar 2 04:22:50 EST 2007
Jussi Salmela <tiedon_jano at hotmail.com> wrote:
> I've run a couple of tests and it seems to me that Dennis Lee Bieber
is
> on the trail of the truth when he claims that smallest magnitude to
> the largest is the way to do the summation. Actually it isn't THE way
> although it diminishes the error. I was sort of astonished because I
> also had somewhere along the years formed the same notion as Dennis.
>
> "Your" pairwise method beats the former method by a large margin
> although I don't understand why. To tell you the truth: I started out
to
> show you were wrong because intuitively (for me at least) the former
> method should work better than yours.
I think Dennis is spot on when he says that smallest to largest is the
best way. What he has missed though is that if you have a list of
similar sized values, then after you have added the first two together
the next addition still needs to follow the smallest to largest rule so
it shouldn't include the result of the first addition. One option, if
you have access to the full list when you start, would be to reinsert
each partial sum back into the sorted list as you calculate it: a heap
queue might be an appropriate implementation for that.
The pairwise sum gives you most, although not quite all, of the benefit
of sorting the data: obviously if the input values are identical it is
almost exactly equivalent to adding the smallest values and then
inserting the partial result back into the list. Even in the 1000000,
0.00001 case it avoids most of the precision-losing additions such as
500000000005.0 + 0.00001 which the simple sum hits.
Also it has the further advantages of still working with other summable
objects (e.g. lists) so long as they are associative and not requiring
you to know all the values or even the length of the sequence in
advance.
For floating point you can of course use methods to preserve the lost
bits such as the kahan sum, or even just do all imtermediate
calculations to a higher internal precision.
By the way, I found another bug in my sumpairs: += is a very bad idea
here if the input can contain mutables, so the relevant line should be:
tmp[-1] = tmp[-1] + v
Once I fixed that one, so it no longer attempts to modify lists in-
place, it appears that the more balanced intermediate results give a
benefit when summing lists as well. sumpairs on a list of lists seems to
beat the builtin sum as soon as the length of the list exceeds about 210
items. At 1000 items it is about 3 times faster and 30 times faster at
10,000 items. If I get the time I think I really should code up a C
version of sumpairs and see how that compares against Pythons builtin
sum for numerical sums.
I guess either everyone else missed that computer science lecture I
thought I remembered, or maybe I was the one who was asleep and I
dreamed it all. :)
More information about the Python-list
mailing list