[Numpy-discussion] Possible improvement to numpy.mean()

Tue Apr 6 17:31:01 EDT 2010

On 04/06/2010 04:07 PM, Michael Gilbert wrote:
> Hi,
>
> I am applying Monte Carlo for a problem involving mixed deterministic
> and random values.  In order to avoid a lot of special handling and
> corner cases, I am using using numpy arrays full of a single value to
> represent the deterministic quantities.
>
> Anyway, I found that the standard deviation turns out to be non-zero
> when these deterministic sets take on large values, which is wrong.
> This is due to machine precision loss.
>
> It turns out to be fairly straightforward to check for this situation
> upfront. See attached code. I've also shown a more accurate algorithm
> for computing the mean, but it adds an additional multiplication for
> every term in the sum, which is obviously undesirable from a
> performance perspective. Would it make sense to automatically detect
> the precision loss and use the more accurate approach when that is the
> case?
>
> If that seems ok, I can take a look at the numpy code, and submit a
> patch.
>
> Best wishes,
> Mike
>    
>
Hi,
Actually you fail to check for the reverse situation when your 'norm' 
value is greater than the expected eps value. But these differences are 
within the expected numerical precision even for float128. In the last 
two cases you are unfairly comparing float64 arrays with float128 
arrays. It would be fairer to use float128 precision as the dtype 
argument in the mean function.

Also any patch has to work with the axis argument. So any 'pre-division 
by the number of observations' has to be done across the user-selected 
axis. For some situations I expect that this 'pre-division' will create 
it's own numerical imprecision - try using the inverse of your inputs.

Bruce