[Numpy-discussion] numpy.mean still broken for largefloat32arrays

Fri Jul 25 14:29:16 EDT 2014

On Fri, Jul 25, 2014 at 5:56 PM, RayS <rays at blue-cove.com> wrote:
> The important point was that it would be best if all of the methods affected
> by summing 32 bit floats with 32 bit accumulators had the same Notes as
> numpy.mean(). We went through a lot of code yesterday, assuming that any
> numpy or Scipy.stats functions that use accumulators suffer the same issue,
> whether noted or not, and found it true.

Do you have a list of the functions that are affected?

> "Depending on the input data, this can cause the results to be inaccurate,
> especially for float32 (see example below). Specifying a higher-precision
> accumulator using the dtype keyword can alleviate this issue." seems rather
> un-Pythonic.

It's true that in its full generality, this problem just isn't
something numpy can solve. Using float32 is extremely dangerous and
should not be attempted unless you're prepared to seriously analyze
all your code for numeric stability; IME it often runs into problems
in practice, in any number of ways. Remember that it only has as much
precision as a 24 bit integer. There are good reasons why float64 is
the default!

That said, it does seem that np.mean could be implemented better than
it is, even given float32's inherent limitations. If anyone wants to
implement better algorithms for computing the mean, variance, sums,
etc., then we would love to add them to numpy. I'd suggest
implementing them as gufuncs -- there are examples of defining gufuncs
in numpy/linalg/umath_linalg.c.src.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org