[Numpy-discussion] numpy.mean still broken for largefloat32arrays
Sebastian Berg
sebastian at sipsolutions.net
Mon Jul 28 11:22:33 EDT 2014
On Mo, 2014-07-28 at 16:31 +0200, Eelco Hoogendoorn wrote:
> Sebastian:
>
>
> Those are good points. Indeed iteration order may already produce
> different results, even though the semantics of numpy suggest
> identical operations. Still, I feel this different behavior without
> any semantical clues is something to be minimized.
>
> Indeed copying might have large speed implications. But on second
> thought, does it? Either the data is already aligned and no copy is
> required, or it isn't aligned, and we need one pass of cache
> inefficient access to the data anyway. Infact, if we had one low level
> function which does cache-intelligent transposition of numpy arrays
> (using some block strategy), this might be faster even than the
> current simple reduction operations when forced to work on awkwardly
> aligned data. Ideally, this intelligent access and intelligent
> reduction would be part of a single pass of course; but that wouldn't
> really fit within the numpy design, and merely such an intelligent
> transpose would provide most of the benefit I think. Or is the
> mechanism behind ascontiguousarray already intelligent in this sense?
>
The iterator is currently smart in the sense that it will (obviously low
level), do something like [1]. Most things in numpy use that iterator,
ascontiguousarray does so as well. Such a blocked cache aware iterator
is what I mean by, someone who knows it would have to spend a lot of
time on it :).
[1] Appendix:
arr = np.ones((100, 100))
arr.sum(1)
# being equivalent (iteration order wise) to:
res = np.zeros(100)
for i in range(100):
res += arr[i, :]
# while arr.sum(0) would be:
for i in range(100):
res[i] = arr[i, :].sum()
>
> On Mon, Jul 28, 2014 at 4:06 PM, Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
> On Mo, 2014-07-28 at 15:35 +0200, Sturla Molden wrote:
> > On 28/07/14 15:21, alex wrote:
> >
> > > Are you sure they always give different results? Notice
> that
> > > np.ones((N,2)).mean(0)
> > > np.ones((2,N)).mean(1)
> > > compute means of different axes on transposed arrays so
> these
> > > differences 'cancel out'.
> >
> > They will be if different algorithms are used.
> np.ones((N,2)).mean(0)
> > will have larger accumulated rounding error than
> np.ones((2,N)).mean(1),
> > if only the latter uses the divide-and-conquer summation.
> >
>
>
> What I wanted to point out is that to some extend the
> algorithm does not
> matter. You will not necessarily get identical results already
> if you
> use a different iteration order, and we have been doing that
> for years
> for speed reasons. All libs like BLAS do the same.
> Yes, the new changes make this much more dramatic, but they
> only make
> some paths much better, never worse. It might be dangerous,
> but only in
> the sense that you test it with the good path and it works
> good enough,
> but later (also) use the other one in some lib. I am not even
> sure if I
>
> > I would suggest that in the first case we try to copy the
> array to a
> > temporary contiguous buffer and use the same
> divide-and-conquer
> > algorithm, unless some heuristics on memory usage fails.
> >
>
>
> Sure, but you have to make major changes to the buffered
> iterator to do
> that without larger speed implications. It might be a good
> idea, but it
> requires someone who knows this stuff to spend a lot of time
> and care in
> the depths of numpy.
>
> > Sturla
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
More information about the NumPy-Discussion
mailing list