Lists: Converting Double to Single

Thu Mar 1 13:05:54 EST 2007

Duncan Booth kirjoitti:
> Dennis Lee Bieber <wlfraed at ix.netcom.com> wrote:
> 
>>      For floating point, smallest magnitude to largest IS the most
>> precise.
>>
>>      Pretend you only have 2 significant digits of precision...
>>
>>      10 + 0.4 + 0.4 + 0.4 => 10
>>
>>      0.4 + 0.4 + 0.4 + 10 => 11
> 
> and if you try the way I suggested then initial input order doesn't 
> matter:
> 
>    (10 + 0.4) = 10, (0.4 + 0.4) = 0.8, (10 + 0.8) = 11
>    (0.4 + 0.4) = 0.8, (10 + 0.4) = 10, (0.8 + 10) = 11
> 
> 
> Pretend you ran the example code I posted. Specifically the bit where 
> the output is:
> 
> all the same
> builtin sum  1000000.0000016732
> pairwise sum 1000000.00001
> 
> It isn't so much the order of the initial values that matters 
> (especially if the values are all similar). What *really* matters is 
> keeping the magnitude of the intermediate results as small as possible. 
> 
> Summing in pairs ensures that you keep the precision as long as 
> possible. The order of the initial values is less important than this 
> (although it does still have a minor effect). For a solution which works 
> with a sequence of unknown length and doesn't require lookahead I don't 
> think you can do much better than that.
> 
> BTW, in case someone thinks my example numbers are a bit too weird you 
> can show the same problem averaging a list of 10 digit floating point 
> numbers with exact representations:
> 
> v = [9999999999.] * 10000000
> print "builtin sum ", (sum(v)/len(v))
> print "pairwise sum", (sumpairs(v)/len(v))
> 
> 
> gives:
> builtin sum  9999999999.91
> pairwise sum 9999999999.0
> 
> In both cases the total is large enough to go beyond the range of 
> integers that can be expressed exactly in floating point and as soon as 
> that happens the builtin sum loses precision on every subsequent 
> addition. pairwise summation only loses precision on a very few of the 
> additions.
> 
> P.S. Apologies for the sloppy coding in the sumpairs function. It should 
> of course do the final addition manually otherwise it is going to give 
> odd results summing lists or strings or anything else where the order 
> matters. Revised version:
> 
> def sumpairs(seq):
>     tmp = []
>     for i,v in enumerate(seq):
>         if i&1:
>             tmp[-1] += v
>             i = i + 1
>             n = i & -i
>             while n > 2:
>                 t = tmp.pop(-1)
>                 tmp[-1] = tmp[-1] + t
>                 n >>= 1
>         else:
>             tmp.append(v)
> 
>     while len(tmp) > 1:
>         t = tmp.pop(-1)
>         tmp[-1] = tmp[-1] + t
>     return tmp[0]
> 
> 

Warning: I'm no floating point guru! Awfully long post following!

I've run a couple of tests and it seems to me that Dennis Lee Bieber is 
  on the trail of the truth when he claims that smallest magnitude to 
the largest is the way to do the summation. Actually it isn't THE way 
although it diminishes the error. I was sort of astonished because I 
also had somewhere along the years formed the same notion as Dennis.

"Your" pairwise method beats the former method by a large margin 
although I don't understand why. To tell you the truth: I started out to 
show you were wrong because intuitively (for me at least) the former 
method should work better than yours.

I also found a method called "Kahan summation algorithm" which probably 
is the best way to do this.

What have I done then? I used your examples and code and added a 
straight summation and the Kahan method using the pseudocode presented 
in the Wikipedia article. I also made an experiment using randomized 
floats merged wildly into a crazy list.

The results (as I interpret them):
1. The straight summation is the same method that the builtin sum uses 
because their results are equal(ly poor).
2. Sorting to increasing order helps these two methods to have better 
accuracy.
3. The pairwise method is almost as good as the Kahan method.

Here's the code and the results:

=========================================================================
CODE:
=========================================================================
import random

def sumpairs(seq):
     tmp = []
     for i,v in enumerate(seq):
         if i&1:
             tmp[-1] += v
             i = i + 1
             n = i & -i
             while n > 2:
                 t = tmp.pop(-1)
                 tmp[-1] = tmp[-1] + t
                 n >>= 1
         else:
             tmp.append(v)

     while len(tmp) > 1:
         t = tmp.pop(-1)
         tmp[-1] = tmp[-1] + t
     return tmp[0]

def straightSum(seq):
     result = 0.0
     for elem in seq: result += elem
     return result

def kahanSum(input):
     sum = input[0]
     n = len(input)
     c = 0.0                 # A running compensation for lost low-order 
bits.
     for i in range(1, n):
         y = input[i] - c    # So far, so good: c is zero.
         t = sum + y         # Alas, sum is big, y small,
                             # so low-order digits of y are lost.
         c = (t - sum) - y   # (t - sum) recovers the high-order part of y;
                             # subtracting y recovers -(low part of y)
         sum = t             # Algebraically, c should always be zero.
                             # Beware eagerly optimising compilers!
         continue            # Next time around, the lost low part will be
                             # added to y in a fresh attempt.
     return sum

print '\n======================> v = [1000000, 0.00001] * 1000000'
v = [1000000, 0.00001] * 1000000
print "CORRECT ", '500000.000005         <========'
print "builtin ", repr(sum(v)/len(v))
print "straight", repr(straightSum(v)/len(v))
print "pairwise", repr(sumpairs(v)/len(v))
print "Kahan   ", repr(kahanSum(v)/len(v))

print "sorted"
v.sort()
print "builtin ", repr(sum(v)/len(v))
print "straight", repr(straightSum(v)/len(v))
print "pairwise", repr(sumpairs(v)/len(v))
print "Kahan   ", repr(kahanSum(v)/len(v))

print "reverse sorted"
v.reverse()
print "builtin ", repr(sum(v)/len(v))
print "straight", repr(straightSum(v)/len(v))
print "pairwise", repr(sumpairs(v)/len(v))
print "Kahan   ", repr(kahanSum(v)/len(v))

print '\n======================> v = [1000000]*1000000 + [0.00001]*1000000'
v = [1000000]*1000000 + [0.00001]*1000000
print "CORRECT ", '500000.000005         <========'
print "builtin ", repr(sum(v)/len(v))
print "straight", repr(straightSum(v)/len(v))
print "pairwise", repr(sumpairs(v)/len(v))
print "Kahan   ", repr(kahanSum(v)/len(v))

print '\n======================> v = [0.00001]*1000000 + [1000000]*1000000'
v = [0.00001]*1000000 + [1000000]*1000000
print "CORRECT ", '500000.000005         <========'
print "builtin ", repr(sum(v)/len(v))
print "straight", repr(straightSum(v)/len(v))
print "pairwise", repr(sumpairs(v)/len(v))
print "Kahan   ", repr(kahanSum(v)/len(v))

print '\n======================> v = [1000000.00001] * 1000000'
v = [1000000.00001] * 1000000
print "CORRECT ", '1000000.00001         <========'
print "builtin ", repr(sum(v)/len(v))
print "straight", repr(straightSum(v)/len(v))
print "pairwise", repr(sumpairs(v)/len(v))
print "Kahan   ", repr(kahanSum(v)/len(v))

print '\n======================> Randomized lists'
print '========> lst1 = [random.random() for i in range(1000000)]'
print '========> lst2 = [1000000*random.random() for i in range(1000000)]'
N = 100000
lst1 = [random.random() for i in range(N)]
lst2 = [1000000.0 + random.random() for i in range(N)]
v = []
for i in range(N):
     v.append(lst1[i])
     v.append(lst2[i])
     v.append(lst2[i])
     v.append(lst2[i])
     v.append(lst1[i])
     v.append(lst2[i])
     v.append(lst2[i])
     v.append(lst2[i])
     v.append(lst2[i])
print '========> v = "Crazy merge of lst1 and lst2'
print 'min, max'
print 'lst1:', min(lst1), max(lst1)
print 'lst2:', min(lst2), max(lst2)
print "builtin ", repr(sum(v)/len(v))
print "straight", repr(straightSum(v)/len(v))
print "pairwise", repr(sumpairs(v)/len(v))
print "Kahan   ", repr(kahanSum(v)/len(v))

print "sorted"
v.sort()
print "builtin ", repr(sum(v)/len(v))
print "straight", repr(straightSum(v)/len(v))
print "pairwise", repr(sumpairs(v)/len(v))
print "Kahan   ", repr(kahanSum(v)/len(v))

print "reverse sorted"
v.reverse()
print "builtin ", repr(sum(v)/len(v))
print "straight", repr(straightSum(v)/len(v))
print "pairwise", repr(sumpairs(v)/len(v))
print "Kahan   ", repr(kahanSum(v)/len(v))
=========================================================================
RESULTS:
=========================================================================

======================> v = [1000000, 0.00001] * 1000000
CORRECT  500000.000005         <========
builtin  500000.00000083662
straight 500000.00000083662
pairwise 500000.00000499998
Kahan    500000.00000499998
sorted
builtin  500000.00000499998
straight 500000.00000499998
pairwise 500000.00000499998
Kahan    500000.00000499998
reverse sorted
builtin  500000.0
straight 500000.0
pairwise 500000.00000500004
Kahan    500000.00000499998

======================> v = [1000000]*1000000 + [0.00001]*1000000
CORRECT  500000.000005         <========
builtin  500000.0
straight 500000.0
pairwise 500000.00000500004
Kahan    500000.00000499998

======================> v = [0.00001]*1000000 + [1000000]*1000000
CORRECT  500000.000005         <========
builtin  500000.00000499998
straight 500000.00000499998
pairwise 500000.00000499998
Kahan    500000.00000499998

======================> v = [1000000.00001] * 1000000
CORRECT  1000000.00001         <========
builtin  1000000.0000016732
straight 1000000.0000016732
pairwise 1000000.00001
Kahan    1000000.00001

======================> Randomized lists
========> lst1 = [random.random() for i in range(1000000)]
========> lst2 = [1000000*random.random() for i in range(1000000)]
========> v = "Crazy merge of lst1 and lst2
min, max
lst1: 1.59494420963e-005 0.999990993581
lst2: 1000000.00001 1000001.0
builtin  777778.27774196747
straight 777778.27774196747
pairwise 777778.27774199261
Kahan    777778.27774199261
sorted
builtin  777778.2777420698
straight 777778.2777420698
pairwise 777778.27774199261
Kahan    777778.27774199261
reverse sorted
builtin  777778.27774202416
straight 777778.27774202416
pairwise 777778.27774199261
Kahan    777778.27774199261
=========================================================================

Cheers,
Jussi