[Numpy-discussion] quickselect

Julian Taylor jtaylor.debian at googlemail.com
Sun Jun 9 11:27:40 EDT 2013


On 09.06.2013 12:10, josef.pktd at gmail.com wrote:
> On Wed, May 29, 2013 at 3:19 PM,  <josef.pktd at gmail.com> wrote:
>> On Wed, May 29, 2013 at 12:25 PM, Julian Taylor
>> <jtaylor.debian at googlemail.com> wrote:
>>> On 05/29/2013 06:12 AM, josef.pktd at gmail.com wrote:
>>>>
>>>> On Tue, May 28, 2013 at 6:31 PM, Charles R Harris
>>>> <charlesr.harris at gmail.com> wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>> There is a PR adding quickselect to numpy as a function `partition`.
>>>>> Comments on name and exposure in the numpy API are welcome.
>>>>
>>>>


here a a quick status report on the PR
https://github.com/numpy/numpy/pull/3360

I now implemented partitioning via the introselect algorithm which is a
quickselect median of 3 pivot with a cutoff on recursion depth to a
median of median of 5 pivot for O(N) worst case complexity.
Additionally it can stores its pivots for reuse in other partitions on
the same data to reduce the space required to be partitioned next time,
this is useful e.g. for multiple percentiles.

It is functionally ready, but there are still some API/ABI issues to be
solved.
Mainly deciding if we put the selection algorithms in the ABI for 1.8 or
not. Currently the ABI is unchanged so user types cannot make use of the
algorithms (they will fall back to quicksort).

The python api is now:
np.partition(array, kth)
where kth is an integer or array of integers

it will move each index in kth into its final sorted position, so
np.partition(a, range(a.size)) results in a (inefficient) sort.
e.g.:

d = np.array([66, 81, 21, 75, 46, -6, 66, 86, 242, 47, 88, 79])
np.partition(d, (2, -2)) # (2, 8)
array([ -6,  21,  46,  47,  75,  66,  66,  79,  81,  86,  88, 242])

Multidimensional arrays will use the same array of kth, you cannot
partition each axis by different values, you would have to explicitly
loop to do that.

Median is implemented in terms of partitioning already, but percentile
is not.
I would suggest someone else than me gives a try at implementing
percentile in terms of partition to see if the documentation and api
make sense to others.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130609/ecd0011e/attachment.sig>


More information about the NumPy-Discussion mailing list