Please help - get average
Robin Becker
robin at reportlab.com
Wed Oct 27 14:21:30 EDT 2004
Frohnhofer, James wrote:
>>Robin Becker wrote
>>...... interestingly the standard algorithms for means/deviations are
>>numerically poor. As an example
>>
>> >>> (1.e30+1-1.0e30)/3
>>0.0
>>
>>Where we have total loss of information in one of the terms.
>>
>>
>>There are better algorithms involving recursion,
>>
>>eg
>>
>> mu[n] = (data[n]+(n-1)*mu[n-1])/n
>>The error in the existing estimate, mu[n-1], gets multiplied by the
>>number (n-1)/n which is less than one (ie the errors are damped).
>>
>
>
> I'm not sure I believe this. It's been a while since I studied this, but I
> think I remember maintaining precision at the Frankenstein level:
> "Multiplication bad. Addition good." Keep the multiplications and divisions
> to a minimum, and try to do them at the end.
>
>
>>However, in the case of widely differing magnitudes I think
>>you need to
>>accumulate the sums specially.
>
>
>
> Well, its not even that hard:
>
> >>> data=[1.e30,1.,-1.e30]
> >>> sum(data)/len(data)
> 0.0
> >>> data.sort(lambda x,y:-cmp(abs(x),abs(y)))
> >>> data
> [-1e+030, 1e+030, 1.0]
> >>> sum(data)/len(data) #Assuming sum works the way I think it should . . .
> 0.33333333333333331
>
> But using the recursive
> >>> def mu(lst):
> n = len(lst)
> if n==1:
> return lst[0]
> else:
> return (lst[0] + ((n-1)*mu(lst[1:])))/n
>
> >>> data=[1.e30,1.,-1.e30]
> >>> mu(data)
> 0.0
> >>> data.sort(lambda x,y:-cmp(abs(x),abs(y)))
> >>> mu(data)
> 0.0
>
> it seems we lose even the ability to maintain the precision by sorting on the
> magnitude.
not so, the sort needs to be reverse absolute for the simplistic analysis to work
and even then I believe it may not suffice.
eg
>>> def mu(data):
... estimate = 0
... for i,v in enumerate(data):
... estimate = (v+i*estimate)/float(i+1)
... return estimate
...
>>> data=[1.e30,1.,-1.e30]
>>> data.sort(lambda x,y: -cmp(abs(x),abs(y)))
>>> data
[1e+030, -1e+030, 1.0]
>>> mu(data)
0.33333333333333331
Of course we now have an O(nLog(n)) average rather than O(n) so the numericists
will blow a gasket.
--
Robin Becker
More information about the Python-list
mailing list