Please help - get average

Wed Oct 27 14:21:30 EDT 2004

Frohnhofer, James wrote:
>>Robin Becker wrote
>>...... interestingly the standard algorithms for means/deviations are 
>>numerically poor. As an example
>>
>> >>> (1.e30+1-1.0e30)/3
>>0.0
>>
>>Where we have total loss of information in one of the terms.
>>
>>
>>There are better algorithms involving recursion,
>>
>>eg
>>
>>    mu[n] = (data[n]+(n-1)*mu[n-1])/n
>>The error in the existing estimate, mu[n-1], gets multiplied by the 
>>number (n-1)/n which is less than one (ie the errors are damped).
>>
> 
> 
> I'm not sure I believe this.  It's been a while since I studied this, but I
> think I remember maintaining precision at the Frankenstein level:
> "Multiplication bad.  Addition good."  Keep the multiplications and divisions
> to a minimum, and try to do them at the end.
> 
> 
>>However, in the case of widely differing magnitudes I think 
>>you need to 
>>accumulate the sums specially.
> 
> 
> 
> Well, its not even that hard:
> 
>  >>> data=[1.e30,1.,-1.e30]
>  >>> sum(data)/len(data)
>  0.0
>  >>> data.sort(lambda x,y:-cmp(abs(x),abs(y)))
>  >>> data
>  [-1e+030, 1e+030, 1.0]
>  >>> sum(data)/len(data)  #Assuming sum works the way I think it should . . .
>  0.33333333333333331
> 
> But using the recursive 
>  >>> def mu(lst):
>          n = len(lst)
>          if n==1:
>              return lst[0]
>          else:
>              return (lst[0] + ((n-1)*mu(lst[1:])))/n
> 
>  >>> data=[1.e30,1.,-1.e30]
>  >>> mu(data)
>  0.0
>  >>> data.sort(lambda x,y:-cmp(abs(x),abs(y)))
>  >>> mu(data)
>  0.0
> 
> it seems we lose even the ability to maintain the precision by sorting on the
> magnitude. 

not so, the sort needs to be reverse absolute for the simplistic analysis to work
and even then I believe it may not suffice.

eg
 >>> def mu(data):
... 	estimate = 0
... 	for i,v in enumerate(data):
... 		estimate = (v+i*estimate)/float(i+1)
... 	return estimate
...
 >>> data=[1.e30,1.,-1.e30]
 >>> data.sort(lambda x,y: -cmp(abs(x),abs(y)))
 >>> data
[1e+030, -1e+030, 1.0]
 >>> mu(data)
0.33333333333333331

Of course we now have an O(nLog(n)) average rather than O(n) so the numericists 
will blow a gasket.
-- 
Robin Becker