[Numpy-discussion] [ANN] Nanny, faster NaN functions

Fri Nov 19 22:42:33 EST 2010

On Fri, Nov 19, 2010 at 7:19 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Fri, Nov 19, 2010 at 1:50 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>
>> On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman <kwgoodman at gmail.com>
>> wrote:
>> > On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen <pav at iki.fi> wrote:
>> >> Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote:
>> >> [clip]
>> >>> My guess is that having separate underlying functions for each dtype,
>> >>> ndim, and axis would be a nightmare for a large project like Numpy.
>> >>> But
>> >>> manageable for a focused project like nanny.
>> >>
>> >> Might be easier to migrate the nan* functions to using Ufuncs.
>> >>
>> >> Unless I'm missing something,
>> >>
>> >>        np.nanmax -> np.fmax.reduce
>> >>        np.nanmin -> np.fmin.reduce
>> >>
>> >> For `nansum`, we'd need to add an ufunc `nanadd`, and for
>> >> `nanargmax/min`, we'd need `argfmin/fmax'.
>> >
>> > How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd,
>> > please.
>> >
>> >>> arr = np.random.rand(1000, 1000)
>> >>> arr[arr > 0.5] = np.nan
>> >>> np.nanmax(arr)
>> >   0.49999625409581072
>> >>> np.fmax.reduce(arr, axis=None)
>> > <snip>
>> > TypeError: an integer is required
>> >>> np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0)
>> >   0.49999625409581072
>> >
>> >>> timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0)
>> > 100 loops, best of 3: 12.7 ms per loop
>> >>> timeit np.nanmax(arr)
>> > 10 loops, best of 3: 39.6 ms per loop
>> >
>> >>> timeit np.nanmax(arr, axis=0)
>> > 10 loops, best of 3: 46.5 ms per loop
>> >>> timeit np.fmax.reduce(arr, axis=0)
>> > 100 loops, best of 3: 12.7 ms per loop
>>
>> Cython is faster than np.fmax.reduce.
>>
>> I wrote a cython version of np.nanmax, called nanmax below. (It only
>> handles the 2d, float64, axis=None case, but since the array is large
>> I don't think that explains the time difference).
>>
>> Note that fmax.reduce is slower than np.nanmax when there are no NaNs:
>>
>> >> arr = np.random.rand(1000, 1000)
>> >> timeit np.nanmax(arr)
>> 100 loops, best of 3: 5.82 ms per loop
>> >> timeit np.fmax.reduce(np.fmax.reduce(arr))
>> 100 loops, best of 3: 9.14 ms per loop
>> >> timeit nanmax(arr)
>> 1000 loops, best of 3: 1.17 ms per loop
>>
>> >> arr[arr > 0.5] = np.nan
>>
>> >> timeit np.nanmax(arr)
>> 10 loops, best of 3: 45.5 ms per loop
>> >> timeit np.fmax.reduce(np.fmax.reduce(arr))
>> 100 loops, best of 3: 12.7 ms per loop
>> >> timeit nanmax(arr)
>> 1000 loops, best of 3: 1.17 ms per loop
>
> There seem to be some odd hardware/compiler dependencies. I get quite a
> different pattern of times:
>
> In [1]: arr = np.random.rand(1000, 1000)
>
> In [2]: timeit np.nanmax(arr)
> 100 loops, best of 3: 10.4 ms per loop
>
> In [3]: timeit np.fmax.reduce(arr.flat)
> 100 loops, best of 3: 2.09 ms per loop
>
> In [4]: arr[arr > 0.5] = np.nan
>
> In [5]: timeit np.nanmax(arr)
> 100 loops, best of 3: 12.9 ms per loop
>
> In [6]: timeit np.fmax.reduce(arr.flat)
> 100 loops, best of 3: 7.09 ms per loop
>
>
> I've tweaked fmax with the reduce loop option but the nanmax times don't
> look like yours at all. I'm also a bit surprised that
> you don't see any difference in times when the array contains a lot of nans.
> I'm running on AMD Phenom, gcc 4.4.5.

Ubuntu 10.04 64 bit, numpy 1.4.1.

Difference in which times? nanny.nanmax with and wintout NaNs? The
code doesn't explictily check for NaNs (it does check for all NaNs).
It basically loops through the data and does:

allnan = 1
ai = ai[i,k]
if ai > amax:
    amax = ai
    allnan = 0

I should make a benchmark suite.