[Numpy-discussion] quickselect

Tue Jun 11 08:37:41 EDT 2013

On Jun 9, 2013, at 11:27 AM, Julian Taylor <jtaylor.debian at googlemail.com> wrote:

> On 09.06.2013 12:10, josef.pktd at gmail.com wrote:
>> On Wed, May 29, 2013 at 3:19 PM,  <josef.pktd at gmail.com> wrote:
>>> On Wed, May 29, 2013 at 12:25 PM, Julian Taylor
>>> <jtaylor.debian at googlemail.com> wrote:
>>>> On 05/29/2013 06:12 AM, josef.pktd at gmail.com wrote:
>>>>> 
>>>>> On Tue, May 28, 2013 at 6:31 PM, Charles R Harris
>>>>> <charlesr.harris at gmail.com> wrote:
>>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> There is a PR adding quickselect to numpy as a function `partition`.
>>>>>> Comments on name and exposure in the numpy API are welcome.
>>>>> 
>>>>> 
> 
> 
> here a a quick status report on the PR
> https://github.com/numpy/numpy/pull/3360
> 
> I now implemented partitioning via the introselect algorithm which is a
> quickselect median of 3 pivot with a cutoff on recursion depth to a
> median of median of 5 pivot for O(N) worst case complexity.
> Additionally it can stores its pivots for reuse in other partitions on
> the same data to reduce the space required to be partitioned next time,
> this is useful e.g. for multiple percentiles.
> 
> It is functionally ready, but there are still some API/ABI issues to be
> solved.
> Mainly deciding if we put the selection algorithms in the ABI for 1.8 or
> not. Currently the ABI is unchanged so user types cannot make use of the
> algorithms (they will fall back to quicksort).
> 
> The python api is now:
> np.partition(array, kth)
> where kth is an integer or array of integers
> 
> it will move each index in kth into its final sorted position, so
> np.partition(a, range(a.size)) results in a (inefficient) sort.
> e.g.:
> 
> d = np.array([66, 81, 21, 75, 46, -6, 66, 86, 242, 47, 88, 79])
> np.partition(d, (2, -2)) # (2, 8)
> array([ -6,  21,  46,  47,  75,  66,  66,  79,  81,  86,  88, 242])
> 
> Multidimensional arrays will use the same array of kth, you cannot
> partition each axis by different values, you would have to explicitly
> loop to do that.
> 
> Median is implemented in terms of partitioning already, but percentile
> is not.
> I would suggest someone else than me gives a try at implementing
> percentile in terms of partition to see if the documentation and api
> make sense to others.
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

Julian,

Since I am the author of the current percentile PR (https://github.com/numpy/numpy/pull/2970), I'm willing to try reimplementing percentile with the new partition functionality.  I don't expect to have time to do this until the Scipy conference in two week, so if anyone else wants to give the implementation a try please feel free.  Julian will you be at Scipy this year if I have any questions?  

	Cheers,

	- Jonathan Helmus