[Numpy-discussion] quickselect
Jonathan J. Helmus
jjhelmus at gmail.com
Tue Jun 11 08:37:41 EDT 2013
On Jun 9, 2013, at 11:27 AM, Julian Taylor <jtaylor.debian at googlemail.com> wrote:
> On 09.06.2013 12:10, josef.pktd at gmail.com wrote:
>> On Wed, May 29, 2013 at 3:19 PM, <josef.pktd at gmail.com> wrote:
>>> On Wed, May 29, 2013 at 12:25 PM, Julian Taylor
>>> <jtaylor.debian at googlemail.com> wrote:
>>>> On 05/29/2013 06:12 AM, josef.pktd at gmail.com wrote:
>>>>>
>>>>> On Tue, May 28, 2013 at 6:31 PM, Charles R Harris
>>>>> <charlesr.harris at gmail.com> wrote:
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> There is a PR adding quickselect to numpy as a function `partition`.
>>>>>> Comments on name and exposure in the numpy API are welcome.
>>>>>
>>>>>
>
>
> here a a quick status report on the PR
> https://github.com/numpy/numpy/pull/3360
>
> I now implemented partitioning via the introselect algorithm which is a
> quickselect median of 3 pivot with a cutoff on recursion depth to a
> median of median of 5 pivot for O(N) worst case complexity.
> Additionally it can stores its pivots for reuse in other partitions on
> the same data to reduce the space required to be partitioned next time,
> this is useful e.g. for multiple percentiles.
>
> It is functionally ready, but there are still some API/ABI issues to be
> solved.
> Mainly deciding if we put the selection algorithms in the ABI for 1.8 or
> not. Currently the ABI is unchanged so user types cannot make use of the
> algorithms (they will fall back to quicksort).
>
> The python api is now:
> np.partition(array, kth)
> where kth is an integer or array of integers
>
> it will move each index in kth into its final sorted position, so
> np.partition(a, range(a.size)) results in a (inefficient) sort.
> e.g.:
>
> d = np.array([66, 81, 21, 75, 46, -6, 66, 86, 242, 47, 88, 79])
> np.partition(d, (2, -2)) # (2, 8)
> array([ -6, 21, 46, 47, 75, 66, 66, 79, 81, 86, 88, 242])
>
> Multidimensional arrays will use the same array of kth, you cannot
> partition each axis by different values, you would have to explicitly
> loop to do that.
>
> Median is implemented in terms of partitioning already, but percentile
> is not.
> I would suggest someone else than me gives a try at implementing
> percentile in terms of partition to see if the documentation and api
> make sense to others.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
Julian,
Since I am the author of the current percentile PR (https://github.com/numpy/numpy/pull/2970), I'm willing to try reimplementing percentile with the new partition functionality. I don't expect to have time to do this until the Scipy conference in two week, so if anyone else wants to give the implementation a try please feel free. Julian will you be at Scipy this year if I have any questions?
Cheers,
- Jonathan Helmus
More information about the NumPy-Discussion
mailing list