[Numpy-discussion] A faster median (Wirth's method)

Keith Goodman kwgoodman at gmail.com
Tue Nov 30 14:21:24 EST 2010


On Tue, Sep 1, 2009 at 2:37 PM, Sturla Molden <sturla at molden.no> wrote:
> Dag Sverre Seljebotn skrev:
>>
>> Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the
>> right type to use in this case?
>>
> By the way, here is a more polished version, does it look ok?
>
> http://projects.scipy.org/numpy/attachment/ticket/1213/generate_qselect.py
> http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.pyx

This is my favorite numpy/scipy ticket. So I am happy that I can
contribute in a small way by pointing out a bug. The search for the
k-th smallest element is only done over the first k elements (that's
the bug) instead of over the entire array. Specifically "while l < k"
should be "while l < r".

I added a median function to the Bottleneck package:
https://github.com/kwgoodman/bottleneck

Timings:

>> import bottleneck as bn
>> arr = np.random.rand(100, 100)
>> timeit np.median(arr)
1000 loops, best of 3: 762 us per loop
>> timeit bn.median(arr)
10000 loops, best of 3: 198 us per loop

What other functions could be built from a selection algorithm?

nanmedian
scoreatpercentile
quantile
knn
select
others?

But before I add more functions to the package I need to figure out
how to make a cython apply_along_axis function. For the first release
I am hand coding the 1d, 2d, and 3d cases. Boring to write, hard to
maintain, and doesn't solve the nd case.

Does anyone have a cython apply_along_axis that takes a cython
reducing function as input? The ticket has an example but I couldn't
get it to run. If no one has one (the horror!) I'll begin to work on
one sometime after the first release.



More information about the NumPy-Discussion mailing list