[Numpy-discussion] Summation of large float32/float64 arrays

Robert Kern robert.kern at gmail.com
Fri May 21 16:26:47 EDT 2010


On Fri, May 21, 2010 at 15:13, Matthew Turk <matthewturk at gmail.com> wrote:
> Hi all,
>
> I have a possibly naive question.  I don't really understand this
> particular set of output:
>
> In [1]: import numpy
>
> In [2]: a1 = numpy.random.random((512,512,512)).astype("float32")
>
> In [3]: a1.sum(axis=0).sum(axis=0).sum(axis=0)
> Out[3]: 67110312.0
>
> In [4]: a1.sum()
> Out[4]: 16777216.0
>
> I recognize that the intermediate sums may accumulate error
> differently than a single call to .sum(), but I guess my concern is
> that it's accumulating a lot faster than I anticipated.  (Interesting
> to note that a1.sum() returns 0.5*512^3, down to the decimal; is it
> summing up the mean, which should be ~0.5?)  However, with a 256^3
> array:
>
> In [1]: import numpy
>
> In [2]: a1 = numpy.random.random((256,256,256)).astype("float32")
>
> In [3]: a1.sum(axis=0).sum(axis=0).sum(axis=0)
> Out[3]: 8389703.0
>
> In [4]: a1.sum()
> Out[4]: 8389245.0
>
> The errors are much more reasonable.  Is there an overflow or
> something that occurs with the 512^3?

It's not quite an overflow.

In [1]: from numpy import *

In [2]: x = float32(16777216.0)

In [3]: x + float32(0.9)
Out[3]: 16777216.0

You are accumulating your result in a float32. With the a.sum()
approach, you eventually hit a level where the next number to add is
always less than the relative epsilon of float32 precision. So the
result doesn't change. And will never change again as long as you only
add one number at a time. Summing along the other axes creates smaller
intermediate sums such that you are usually adding together numbers
roughly in the same regime as each other, so you don't lose as much
precision.

Use a.sum(dtype=np.float64) to use a float64 accumulator.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the NumPy-Discussion mailing list