[Numpy-discussion] quickselect via np.partition available in 1.8.dev

Julian Taylor jtaylor.debian at googlemail.com
Mon Aug 12 10:23:34 EDT 2013


Hi,
a selection algorithm [0] has now landed in the numpy development branch 
[1].
The function exposing it is:
numpy.partition(data, kth=int/array, axis=-1, kind="introselect", 
order=None)
Please see the docstrings on what it actually does (and report if they 
are confusing).

Thanks to the numpy developers for the review and thanks to all who 
provided comments.

If you have a program which might benefit from using selection instead 
of sorting please try it out and report if you are happy with its result 
and the api.

As first function in numpy median has been converted to use partition so 
it now scales linear with the datasize instead of linearithmic. You can 
probably expect a five times speedup for array sizes around 1e6.

But this also involves a slight change in behavior for the case where 
you use overwrite_input=True
In the past this sorted the full array, now it will only be partially 
sorted. That the array is sorted was always documented as an 
implementation detail so hopefully that won't break any code.

The next function to be adapted will most likely be numpy.percentile.

[0] http://en.wikipedia.org/wiki/Selection_algorithm
[1] https://github.com/numpy/numpy/pull/3360



More information about the NumPy-Discussion mailing list