[Numpy-discussion] numpy.mean still broken for largefloat32arrays

Sat Jul 26 05:05:08 EDT 2014

Cool, sounds  like great improvements. I can imagine that after some loop unrolling one becomes memory bound pretty soon. Is the summation guaranteed to traverse the data in its natural order? And do you happen to know what the rules for choosing accumulator dtypes are?

-----Original Message-----
From: "Julian Taylor" <jtaylor.debian at googlemail.com>
Sent: ‎26-‎7-‎2014 00:58
To: "Discussion of Numerical Python" <numpy-discussion at scipy.org>
Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

On 25.07.2014 23:51, Eelco Hoogendoorn wrote:
> Ray: I'm not working with Hubble data, but yeah these are all issues
> I've run into with my terrabytes of microscopy data as well. Given that
> such raw data comes as uint16, its best to do your calculations as much
> as possible in good old ints. What you compute is what you get, no
> obscure shenanigans.

integers are dangerous too, they overflow quickly and signed overflow is
even undefined in C the standard.

> 
> It just occurred to me that pairwise summation will lead to highly
> branchy code, and you can forget about any vector extensions. Tradeoffs
> indeed. Any such hierarchical summation is probably best done by
> aggregating naively summed blocks.

pairwise summation is usually implemented with a naive sum cutoff large
enough so the recursion does not matter much.
In numpy 1.9 this cutoff is 128 elements, but the inner loop is unrolled
8 times which makes it effectively 16 elements.
the unrolling factor of 8 was intentionally chosen to allow using AVX in
the inner loop without changing the summation ordering, but last I
tested actually using AVX here only gave mediocre speedups (10%-20% on
an i5).

> ------------------------------------------------------------------------
> From: RayS <mailto:rays at blue-cove.com>
> Sent: ‎25-‎7-‎2014 23:26
> To: Discussion of Numerical Python <mailto:numpy-discussion at scipy.org>
> Subject: Re: [Numpy-discussion] numpy.mean still broken for
> largefloat32arrays
> 
> At 11:29 AM 7/25/2014, you wrote:
>>On Fri, Jul 25, 2014 at 5:56 PM, RayS <rays at blue-cove.com> wrote:
>> > The important point was that it would be best if all of the
>> methods affected
>> > by summing 32 bit floats with 32 bit accumulators had the same Notes as
>> > numpy.mean(). We went through a lot of code yesterday, assuming that any
>> > numpy or Scipy.stats functions that use accumulators suffer the same
> issue,
>> > whether noted or not, and found it true.
>>
>>Do you have a list of the functions that are affected?
> 
> We only tested a few we used, but
> scipy.stats.nanmean, or any .*mean()
> numpy.sum, mean, average, std, var,...
> 
> via something like:
> 
> import numpy
> import scipy.stats
> print numpy.__version__
> print scipy.__version__
> onez = numpy.ones((2**25, 1), numpy.float32)
> step = 2**10
> func = scipy.stats.nanmean
> for s in range(2**24-step, 2**25, step):
>      if func(onez[:s+step])!=1.:
>          print '\nbroke', s, func(onez[:s+step])
>          break
>      else:
>          print '\r',s,
> 
>>  That said, it does seem that np.mean could be implemented better than
>>it is, even given float32's inherent limitations. If anyone wants to
>>implement better algorithms for computing the mean, variance, sums,
>>etc., then we would love to add them to numpy.
> 
> Others have pointed out the possible tradeoffs in summation algos,
> perhaps a method arg would be appropriate, "better" depending on your
> desire for speed vs. accuracy.
> 
> It just occurred to me that if the STSI folks (who count photons)
> took the mean() or other such func of an image array from Hubble
> sensors to find background value, they'd better always be using float64.
> 
>   - Ray
> 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140726/d564df7c/attachment.html>