[SciPy-User] [ANN] Bottleneck 0.2

Tue Dec 28 11:57:07 EST 2010

On Tue, Dec 28, 2010 at 5:42 AM, Dag Sverre Seljebotn
<dagss at student.matnat.uio.no> wrote:
> On 12/27/2010 09:04 PM, Keith Goodman wrote:
>> Bottleneck is a collection of fast NumPy array functions written in Cython.
>>
>> The second release of Bottleneck is faster, contains more functions,
>> and supports more dtypes.
>>
>
> Another special case for you if you want: It seems that you could add
> the case of "mode='c'" to the array declarations, in the case that the
> operation goes along the last axis and arr.flags.c_contiguous == True.

Wow! That works great for large input arrays:

>> a = np.random.rand(1000,1000)
>> timeit bn.func.nanmean_2d_float64_axis1(a)
1000 loops, best of 3: 1.52 ms per loop
>> timeit a.flags.c_contiguous == True; bn.func.nanmean_2d_float64_ccontiguous_axis1(a)
1000 loops, best of 3: 1.18 ms per loop

And for medium arrays:

>> a = np.random.rand(100,100)
>> timeit bn.func.nanmean_2d_float64_axis1(a)
100000 loops, best of 3: 16.3 us per loop
>> timeit a.flags.c_contiguous == True; bn.func.nanmean_2d_float64_ccontiguous_axis1(a)
100000 loops, best of 3: 13.3 us per loop

But the overhead of checking for c contiguous slows things down for
small arrays:

>> a = np.random.rand(10,10)
>> timeit bn.func.nanmean_2d_float64_axis1(a)
1000000 loops, best of 3: 1.28 us per loop
>> timeit a.flags.c_contiguous == True; bn.func.nanmean_2d_float64_ccontiguous_axis1(a)
1000000 loops, best of 3: 1.55 us per loop
>> timeit a.flags.c_contiguous == True
1000000 loops, best of 3: 201 ns per loop
>> timeit a.flags.c_contiguous
10000000 loops, best of 3: 158 ns per loop

Plus I'd have to check if the axis is the last one.

That's a big speed up for hand coded functions and large input arrays.
But I'm not sure how to take advantage of it for general use
functions. One option is to provide the low level functions (like
nanmean_2d_float64_ccontiguous_axis1) but not use them in the
high-level function nanmean.

I tried using mode='c' when initializing the output array. But I did
not see any speed difference perhaps because the size of the output
array is the square root of the input array size. So I tried it with a
non-reducing function: move_nanmean. But I didn't see any speed
difference. No idea why.