[Numpy-discussion] numpy.mean still broken for largefloat32arrays

Sat Jul 26 05:15:00 EDT 2014

On Fr, 2014-07-25 at 21:23 +0200, Eelco Hoogendoorn wrote:
> It need not be exactly representable as such; take the mean of [1, 1
> +eps] for instance. Granted, there are at most two number in the range
> of the original dtype which are closest to the true mean; but im not
> sure that computing them exactly is a tractable problem for arbitrary
> input.
> 
<snip>
> 
> This only requires log(N) space on the stack if properly implemented,
> and is not platform dependent, nor should have any backward
> compatibility issues that I can think of. But im not sure how easy it
> would be to implement, given the current framework. The ability to
> specify different algorithms per kwarg wouldn't be a bad idea either,
> imo; or the ability to explicitly specify a separate output and
> accumulator dtype.
> 
> 

Well, you already can use dtype to cause an upcast of both arrays.
However this currently will cause a buffered upcast to float64 for the
float32 data. You could also add a d,f->d loop to avoid the cast, but
then you would have to use the out argument currently.

In any case, the real solution here is IMO what I think most of us
already thought before would be good, and that is a keyword argument or
maybe context (though I am unsure about details with threading, etc.) to
chose more stable algorithms for such statistical functions. The
pairwise summation that is in master now is very awesome, but it is not
secure enough in the sense that a new user will have difficulty
understanding when he can be sure it is used.

- Sebastian

> 
> On Fri, Jul 25, 2014 at 8:00 PM, Alan G Isaac <alan.isaac at gmail.com>
> wrote:
>         On 7/25/2014 1:40 PM, Eelco Hoogendoorn wrote:
>         > At the risk of repeating myself: explicit is better than
>         implicit
<snip>