[Numpy-discussion] quickselect via np.partition available in 1.8.dev
Julian Taylor
jtaylor.debian at googlemail.com
Mon Aug 12 10:23:34 EDT 2013
Hi,
a selection algorithm [0] has now landed in the numpy development branch
[1].
The function exposing it is:
numpy.partition(data, kth=int/array, axis=-1, kind="introselect",
order=None)
Please see the docstrings on what it actually does (and report if they
are confusing).
Thanks to the numpy developers for the review and thanks to all who
provided comments.
If you have a program which might benefit from using selection instead
of sorting please try it out and report if you are happy with its result
and the api.
As first function in numpy median has been converted to use partition so
it now scales linear with the datasize instead of linearithmic. You can
probably expect a five times speedup for array sizes around 1e6.
But this also involves a slight change in behavior for the case where
you use overwrite_input=True
In the past this sorted the full array, now it will only be partially
sorted. That the array is sorted was always documented as an
implementation detail so hopefully that won't break any code.
The next function to be adapted will most likely be numpy.percentile.
[0] http://en.wikipedia.org/wiki/Selection_algorithm
[1] https://github.com/numpy/numpy/pull/3360
More information about the NumPy-Discussion
mailing list