Lists: Converting Double to Single
Duncan Booth
duncan.booth at invalid.invalid
Tue Feb 27 13:53:08 EST 2007
Steven D'Aprano <steve at REMOVE.THIS.cybersource.com.au> wrote:
> If the values vary greatly in magnitude, you probably want to add them
> from smallest to biggest; other than that, how else can you calculate the
> mean?
>
It doesn't have to be smallest to largest, but the important thing is not
to be adding the millionth element to the sum of the preceding 999999
values. So if I remember correctly one option is to add pairs of numbers,
then sum the pairs and so on.
I think my point really is that nobody is going to bother doing this when
they just want to sum a few numbers, and I'm not really surprised that the
builtin sum doesn't do it either, but I was slightly surprised that numpy
doesn't try to minimise loss of precision.
Here's a quick attempt at implementing it which demonstrates how the
builtin sum loses precision somewhat faster than a pairwise sum (and is
rather dependant on the input order). sumpair is of course a lot slower
than the builtin sum, but it might be interesting to see how a C coded
version compared. I can't see that it would necessarily be much slower: it
just adds some counters, bit twiddling and a short temporary array to the
calculation.
C:\temp>sumpairs.py
builtin sum 500000.00000083662
pairwise sum 500000.00000499998
sorted
builtin sum 500000.00000499998
pairwise sum 500000.00000499998
reverse sorted
builtin sum 500000.0
pairwise sum 500000.00000500004
all the same
builtin sum 1000000.0000016732
pairwise sum 1000000.00001
------------ sumpairs.py ------------------------------------------
# Sum implemented so that intermediate results sum a similar number
# of the input values.
def sumpairs(seq):
tmp = []
for i,v in enumerate(seq):
if i&1:
tmp[-1] += v
i = i + 1
n = i & -i
while n > 2:
t = tmp.pop(-1)
tmp[-1] = tmp[-1] + t
n >>= 1
else:
tmp.append(v)
tmp.reverse()
return sum(tmp)
v = [1000000, 0.00001] * 1000000
print "builtin sum ", repr(sum(v)/len(v))
print "pairwise sum", repr(sumpairs(v)/len(v))
print "sorted"
v.sort()
print "builtin sum ", repr(sum(v)/len(v))
print "pairwise sum", repr(sumpairs(v)/len(v))
print "reverse sorted"
v.reverse()
print "builtin sum ", repr(sum(v)/len(v))
print "pairwise sum", repr(sumpairs(v)/len(v))
print "all the same"
v = [1000000.00001] * 1000000
print "builtin sum ", repr(sum(v)/len(v))
print "pairwise sum", repr(sumpairs(v)/len(v))-----------------------------
----------------------------------
In case the code is a bit hard to follow this version makes it clearer what
is happening:
------------- sumpair2.py ---------------
def sumpairs(seq):
tmp = []
for i,v in enumerate(seq):
if i&1:
tmp[-1] += v
print "add ", tmp
i = i + 1
n = i & -i
while n > 2:
t = tmp.pop(-1)
tmp[-1] = tmp[-1] + t
n >>= 1
print "reduce", tmp
else:
tmp.append(v)
print "append", tmp
tmp.reverse()
return sum(tmp)
print sumpairs([1]*40)
-----------------------------------------
which gives the output:
append [1]
add [2]
append [2, 1]
add [2, 2]
reduce [4]
append [4, 1]
add [4, 2]
append [4, 2, 1]
add [4, 2, 2]
reduce [4, 4]
reduce [8]
append [8, 1]
add [8, 2]
append [8, 2, 1]
add [8, 2, 2]
reduce [8, 4]
append [8, 4, 1]
add [8, 4, 2]
append [8, 4, 2, 1]
add [8, 4, 2, 2]
reduce [8, 4, 4]
reduce [8, 8]
reduce [16]
append [16, 1]
add [16, 2]
append [16, 2, 1]
add [16, 2, 2]
reduce [16, 4]
append [16, 4, 1]
add [16, 4, 2]
append [16, 4, 2, 1]
add [16, 4, 2, 2]
reduce [16, 4, 4]
reduce [16, 8]
append [16, 8, 1]
add [16, 8, 2]
append [16, 8, 2, 1]
add [16, 8, 2, 2]
reduce [16, 8, 4]
append [16, 8, 4, 1]
add [16, 8, 4, 2]
append [16, 8, 4, 2, 1]
add [16, 8, 4, 2, 2]
reduce [16, 8, 4, 4]
reduce [16, 8, 8]
reduce [16, 16]
reduce [32]
append [32, 1]
add [32, 2]
append [32, 2, 1]
add [32, 2, 2]
reduce [32, 4]
append [32, 4, 1]
add [32, 4, 2]
append [32, 4, 2, 1]
add [32, 4, 2, 2]
reduce [32, 4, 4]
reduce [32, 8]
40
More information about the Python-list
mailing list