[SciPy-User] [ANN] Bottleneck 0.2

Tue Dec 28 08:42:21 EST 2010

On 12/27/2010 09:04 PM, Keith Goodman wrote:
> Bottleneck is a collection of fast NumPy array functions written in Cython.
>
> The second release of Bottleneck is faster, contains more functions,
> and supports more dtypes.
>    

Another special case for you if you want: It seems that you could add 
the case of "mode='c'" to the array declarations, in the case that the 
operation goes along the last axis and arr.flags.c_contiguous == True.

Dag Sverre

> Faster:
> - All functions faster (less overhead) when output is not a scalar
> - Faster nanmean() for 2d, 3d arrays containing NaNs when axis is not None
>
> New functions:
> - nanargmin()
> - nanargmax()
> - nanmedian, 100X faster than SciPy's nanmedian for (100,100) input, axis=0
>
> Enhancements:
> - Added support for float32
> - Fallback to slower, non-Cython functions for unaccelerated ndim/dtype
> - Scipy is no longer a dependency
> - Added support for older versions of NumPy (1.4.1)
> - All functions are now templated for dtype and axis
> - Added a sandbox for prototyping of new Bottleneck functions
> - Rewrote benchmarking code
>
> Breaks from 0.1.0:
> - To run benchmark use bn.bench() instead of bn.benchit()
>
> download
>      http://pypi.python.org/pypi/Bottleneck
> docs
>      http://berkeleyanalytics.com/bottleneck
> code
>      http://github.com/kwgoodman/bottleneck
> mailing list
>      http://groups.google.com/group/bottle-neck
> mailing list 2
>      http://mail.scipy.org/mailman/listinfo/scipy-user
>
> Bottleneck comes with a benchmark suite that compares the performance
> of the bottleneck functions that have a NumPy/SciPy equivalent. To run
> the benchmark:
>
>      >>>  bn.bench(mode='fast')
>      Bottleneck performance benchmark
>          Bottleneck  0.2.0
>          Numpy (np)  1.5.1
>          Scipy (sp)  0.8.0
>          Speed is NumPy or SciPy time divided by Bottleneck time
>          NaN means one-third NaNs; axis=0 and float64 are used
>      median vs np.median
>          3.59  (10,10)
>          2.43  (1001,1001)
>          2.28  (1000,1000)
>          2.16  (100,100)
>      nanmedian vs local copy of sp.stats.nanmedian
>        102.72  (10,10)      NaN
>         94.34  (10,10)
>         67.89  (100,100)    NaN
>         28.52  (100,100)
>          6.37  (1000,1000)  NaN
>          4.41  (1000,1000)
>      nanmax vs np.nanmax
>          9.99  (100,100)    NaN
>          6.12  (10,10)      NaN
>          5.99  (10,10)
>          5.88  (100,100)
>          1.79  (1000,1000)  NaN
>          1.76  (1000,1000)
>      nanmean vs local copy of sp.stats.nanmean
>         25.95  (100,100)    NaN
>         12.85  (100,100)
>         12.26  (10,10)      NaN
>         11.89  (10,10)
>          5.15  (1000,1000)  NaN
>          3.17  (1000,1000)
>      nanstd vs local copy of sp.stats.nanstd
>         16.96  (100,100)    NaN
>         15.75  (10,10)      NaN
>         15.49  (10,10)
>          9.51  (100,100)
>          3.85  (1000,1000)  NaN
>          2.82  (1000,1000)
>      nanargmax vs np.nanargmax
>          8.60  (100,100)    NaN
>          5.65  (10,10)      NaN
>          5.62  (100,100)
>          5.44  (10,10)
>          2.84  (1000,1000)  NaN
>          2.58  (1000,1000)
>      move_nanmean vs sp.ndimage.convolve1d based function
>          window = 5
>         19.52  (10,10)      NaN
>         18.55  (10,10)
>         10.56  (100,100)    NaN
>          6.67  (100,100)
>          5.19  (1000,1000)  NaN
>          4.42  (1000,1000)
>
> Under the hood Bottleneck uses a separate Cython function for each
> combination of ndim, dtype, and axis. A lot of the overhead in
> bn.nanmax(), for example, is in checking that the axis is within
> range, converting non-array data to an array, and selecting the
> function to use to calculate the maximum. You can get rid of the
> overhead by calling the underlying Cython function directly.
>
> Benchmarks for the low-level Cython version of each function:
>
>      >>>  bn.bench(mode='faster')
>      Bottleneck performance benchmark
>          Bottleneck  0.2.0
>          Numpy (np)  1.5.1
>          Scipy (sp)  0.8.0
>          Speed is NumPy or SciPy time divided by Bottleneck time
>          NaN means one-third NaNs; axis=0 and float64 are used
>      median_selector vs np.median
>         15.29  (10,10)
>         14.19  (100,100)
>          8.04  (1001,1001)
>          7.32  (1000,1000)
>      nanmedian_selector vs local copy of sp.stats.nanmedian
>        352.08  (10,10)      NaN
>        340.27  (10,10)
>        185.56  (100,100)    NaN
>        138.81  (100,100)
>          8.21  (1000,1000)
>          8.09  (1000,1000)  NaN
>      nanmax_selector vs np.nanmax
>         21.54  (10,10)      NaN
>         19.98  (10,10)
>         12.65  (100,100)    NaN
>          6.82  (100,100)
>          1.79  (1000,1000)  NaN
>          1.76  (1000,1000)
>      nanmean_selector vs local copy of sp.stats.nanmean
>         41.08  (10,10)      NaN
>         39.05  (10,10)
>         31.74  (100,100)    NaN
>         15.24  (100,100)
>          5.13  (1000,1000)  NaN
>          3.16  (1000,1000)
>      nanstd_selector vs local copy of sp.stats.nanstd
>         44.55  (10,10)      NaN
>         43.49  (10,10)
>         18.66  (100,100)    NaN
>         10.29  (100,100)
>          3.83  (1000,1000)  NaN
>          2.82  (1000,1000)
>      nanargmax_selector vs np.nanargmax
>         17.91  (10,10)      NaN
>         17.00  (10,10)
>         10.56  (100,100)    NaN
>          6.50  (100,100)
>          2.85  (1000,1000)  NaN
>          2.59  (1000,1000)
>      move_nanmean_selector vs sp.ndimage.convolve1d based function
>          window = 5
>         55.96  (10,10)      NaN
>         50.82  (10,10)
>         11.77  (100,100)    NaN
>          6.93  (100,100)
>          5.56  (1000,1000)  NaN
>          4.51  (1000,1000)
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>