Lists: Converting Double to Single

Tue Feb 27 13:53:08 EST 2007

Steven D'Aprano <steve at REMOVE.THIS.cybersource.com.au> wrote:

> If the values vary greatly in magnitude, you probably want to add them
> from smallest to biggest; other than that, how else can you calculate the
> mean?
> 
It doesn't have to be smallest to largest, but the important thing is not 
to be adding the millionth element to the sum of the preceding 999999 
values. So if I remember correctly one option is to add pairs of numbers, 
then sum the pairs and so on.

I think my point really is that nobody is going to bother doing this when 
they just want to sum a few numbers, and I'm not really surprised that the 
builtin sum doesn't do it either, but I was slightly surprised that numpy 
doesn't try to minimise loss of precision.

Here's a quick attempt at implementing it which demonstrates how the 
builtin sum loses precision somewhat faster than a pairwise sum (and is 
rather dependant on the input order). sumpair is of course a lot slower 
than the builtin sum, but it might be interesting to see how a C coded 
version compared. I can't see that it would necessarily be much slower: it 
just adds some counters, bit twiddling and a short temporary array to the 
calculation.

C:\temp>sumpairs.py
builtin sum  500000.00000083662
pairwise sum 500000.00000499998
sorted
builtin sum  500000.00000499998
pairwise sum 500000.00000499998
reverse sorted
builtin sum  500000.0
pairwise sum 500000.00000500004
all the same
builtin sum  1000000.0000016732
pairwise sum 1000000.00001

------------ sumpairs.py ------------------------------------------
# Sum implemented so that intermediate results sum a similar number
# of the input values.
def sumpairs(seq):
    tmp = []
    for i,v in enumerate(seq):
        if i&1:
            tmp[-1] += v
            i = i + 1
            n = i & -i
            while n > 2:
                t = tmp.pop(-1)
                tmp[-1] = tmp[-1] + t
                n >>= 1
        else:
            tmp.append(v)

    tmp.reverse()
    return sum(tmp)

v = [1000000, 0.00001] * 1000000
print "builtin sum ", repr(sum(v)/len(v))
print "pairwise sum", repr(sumpairs(v)/len(v))

print "sorted"
v.sort()
print "builtin sum ", repr(sum(v)/len(v))
print "pairwise sum", repr(sumpairs(v)/len(v))

print "reverse sorted"
v.reverse()
print "builtin sum ", repr(sum(v)/len(v))
print "pairwise sum", repr(sumpairs(v)/len(v))

print "all the same"
v = [1000000.00001] * 1000000
print "builtin sum ", repr(sum(v)/len(v))
print "pairwise sum", repr(sumpairs(v)/len(v))-----------------------------
----------------------------------

In case the code is a bit hard to follow this version makes it clearer what 
is happening:

------------- sumpair2.py ---------------
def sumpairs(seq):
    tmp = []
    for i,v in enumerate(seq):
        if i&1:
            tmp[-1] += v
            print "add   ", tmp
            i = i + 1
            n = i & -i
            while n > 2:
                t = tmp.pop(-1)
                tmp[-1] = tmp[-1] + t
                n >>= 1
                print "reduce", tmp
        else:
            tmp.append(v)
            print "append", tmp

    tmp.reverse()
    return sum(tmp)

print sumpairs([1]*40)
-----------------------------------------
which gives the output:

append [1]
add    [2]
append [2, 1]
add    [2, 2]
reduce [4]
append [4, 1]
add    [4, 2]
append [4, 2, 1]
add    [4, 2, 2]
reduce [4, 4]
reduce [8]
append [8, 1]
add    [8, 2]
append [8, 2, 1]
add    [8, 2, 2]
reduce [8, 4]
append [8, 4, 1]
add    [8, 4, 2]
append [8, 4, 2, 1]
add    [8, 4, 2, 2]
reduce [8, 4, 4]
reduce [8, 8]
reduce [16]
append [16, 1]
add    [16, 2]
append [16, 2, 1]
add    [16, 2, 2]
reduce [16, 4]
append [16, 4, 1]
add    [16, 4, 2]
append [16, 4, 2, 1]
add    [16, 4, 2, 2]
reduce [16, 4, 4]
reduce [16, 8]
append [16, 8, 1]
add    [16, 8, 2]
append [16, 8, 2, 1]
add    [16, 8, 2, 2]
reduce [16, 8, 4]
append [16, 8, 4, 1]
add    [16, 8, 4, 2]
append [16, 8, 4, 2, 1]
add    [16, 8, 4, 2, 2]
reduce [16, 8, 4, 4]
reduce [16, 8, 8]
reduce [16, 16]
reduce [32]
append [32, 1]
add    [32, 2]
append [32, 2, 1]
add    [32, 2, 2]
reduce [32, 4]
append [32, 4, 1]
add    [32, 4, 2]
append [32, 4, 2, 1]
add    [32, 4, 2, 2]
reduce [32, 4, 4]
reduce [32, 8]
40