[Numpy-discussion] [ANN] Nanny, faster NaN functions

Fri Nov 19 22:19:20 EST 2010

On Fri, Nov 19, 2010 at 1:50 PM, Keith Goodman <kwgoodman at gmail.com> wrote:

> On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman <kwgoodman at gmail.com>
> wrote:
> > On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen <pav at iki.fi> wrote:
> >> Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote:
> >> [clip]
> >>> My guess is that having separate underlying functions for each dtype,
> >>> ndim, and axis would be a nightmare for a large project like Numpy. But
> >>> manageable for a focused project like nanny.
> >>
> >> Might be easier to migrate the nan* functions to using Ufuncs.
> >>
> >> Unless I'm missing something,
> >>
> >>        np.nanmax -> np.fmax.reduce
> >>        np.nanmin -> np.fmin.reduce
> >>
> >> For `nansum`, we'd need to add an ufunc `nanadd`, and for
> >> `nanargmax/min`, we'd need `argfmin/fmax'.
> >
> > How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd,
> please.
> >
> >>> arr = np.random.rand(1000, 1000)
> >>> arr[arr > 0.5] = np.nan
> >>> np.nanmax(arr)
> >   0.49999625409581072
> >>> np.fmax.reduce(arr, axis=None)
> > <snip>
> > TypeError: an integer is required
> >>> np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0)
> >   0.49999625409581072
> >
> >>> timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0)
> > 100 loops, best of 3: 12.7 ms per loop
> >>> timeit np.nanmax(arr)
> > 10 loops, best of 3: 39.6 ms per loop
> >
> >>> timeit np.nanmax(arr, axis=0)
> > 10 loops, best of 3: 46.5 ms per loop
> >>> timeit np.fmax.reduce(arr, axis=0)
> > 100 loops, best of 3: 12.7 ms per loop
>
> Cython is faster than np.fmax.reduce.
>
> I wrote a cython version of np.nanmax, called nanmax below. (It only
> handles the 2d, float64, axis=None case, but since the array is large
> I don't think that explains the time difference).
>
> Note that fmax.reduce is slower than np.nanmax when there are no NaNs:
>
> >> arr = np.random.rand(1000, 1000)
> >> timeit np.nanmax(arr)
> 100 loops, best of 3: 5.82 ms per loop
> >> timeit np.fmax.reduce(np.fmax.reduce(arr))
> 100 loops, best of 3: 9.14 ms per loop
> >> timeit nanmax(arr)
> 1000 loops, best of 3: 1.17 ms per loop
>
> >> arr[arr > 0.5] = np.nan
>
> >> timeit np.nanmax(arr)
> 10 loops, best of 3: 45.5 ms per loop
> >> timeit np.fmax.reduce(np.fmax.reduce(arr))
> 100 loops, best of 3: 12.7 ms per loop
> >> timeit nanmax(arr)
> 1000 loops, best of 3: 1.17 ms per loop
>

There seem to be some odd hardware/compiler dependencies. I get quite a
different pattern of times:

In [1]: arr = np.random.rand(1000, 1000)

In [2]: timeit np.nanmax(arr)
100 loops, best of 3: 10.4 ms per loop

In [3]: timeit np.fmax.reduce(arr.flat)
100 loops, best of 3: 2.09 ms per loop

In [4]: arr[arr > 0.5] = np.nan

In [5]: timeit np.nanmax(arr)
100 loops, best of 3: 12.9 ms per loop

In [6]: timeit np.fmax.reduce(arr.flat)
100 loops, best of 3: 7.09 ms per loop

I've tweaked fmax with the reduce loop option but the nanmax times don't
look like yours at all. I'm also a bit surprised that
you don't see any difference in times when the array contains a lot of nans.
I'm running on AMD Phenom, gcc 4.4.5.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20101119/9c322df8/attachment.html>