[Numpy-discussion] Sorting refactor

Jaime Fernández del Río jaime.frio at gmail.com
Fri Jan 16 19:11:55 EST 2015


On Fri, Jan 16, 2015 at 8:15 AM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Fri, Jan 16, 2015 at 7:11 AM, Jaime Fernández del Río <
> jaime.frio at gmail.com> wrote:
>
>> On Fri, Jan 16, 2015 at 3:33 AM, Lars Buitinck <larsmans at gmail.com>
>> wrote:
>>
>>> 2015-01-16 11:55 GMT+01:00  <numpy-discussion-request at scipy.org>:
>>> > Message: 2
>>> > Date: Thu, 15 Jan 2015 21:24:00 -0800
>>> > From: Jaime Fern?ndez del R?o <jaime.frio at gmail.com>
>>> > Subject: [Numpy-discussion] Sorting refactor
>>> > To: Discussion of Numerical Python <numpy-discussion at scipy.org>
>>> > Message-ID:
>>> >         <
>>> CAPOWHWkF6RnWcrGmcwsmq_LO3hShjgBVLsrN19z-MDPe25E2Aw at mail.gmail.com>
>>> > Content-Type: text/plain; charset="utf-8"
>>> >
>>> > This changes will make it easier for me to add a Timsort generic type
>>> > function to numpy's arsenal of sorting routines. And I think they have
>>> > value by cleaning the source code on their own.
>>>
>>> Yes, they do. I've been looking at the sorting functions as well and
>>> I've found the following:
>>>
>>> * The code is generally hard to read because it prefers pointer over
>>> indices. I'm wondering if it would get slower using indices. The
>>> closer these algorithms are to the textbook, the easier to insert
>>> fancy optimizations.
>>>
>>
>> They are harder to read, but so cute to look at! C code just wouldn't
>> feel the same without some magical pointer arithmetic thrown in here and
>> there. ;-)
>>
>
> Pointers were faster than indexing. That advantage can be hardware
> dependent, but for small numbers of pointers is typical.
>
>
>>
>>
>>> * The heap sort exploits undefined behavior by using a pointer that
>>> points before the start of the array. However, rewriting it to always
>>> point within the array made it slower. I haven't tried rewriting it
>>> using indices
>>
>>
> Fortran uses the same pointer trick for one based indexing, or at least
> the old DEC compilers did ;) There is no reason to avoid it.
>
>
>> .
>>>
>>> * Quicksort has a quadratic time worst case. I think it should be
>>> turned into an introsort [1] for O(n log n) worst case; we have the
>>> heapsort needed to do that.
>>>
>>> * Quicksort is robust to repeated elements, but doesn't exploit them.
>>> It can be made to run in linear time if the input array has only O(1)
>>> distinct elements [2]. This may come at the expense of some
>>> performance on arrays with no repeated elements.
>>>
>>
>> Java famously changed its library implementation of quicksort to a dual
>> pivot one invented by Vladimir Yaroslavskiy[1], they claim that with
>> substantial performance gains. I tried to implement that for numpy [2], but
>> couldn't get it to work any faster than the current code.
>>
>
> For sorting, simple often beats fancy.
>
>
>>
>> * Using optimal sorting networks instead of insertion sort as the base
>>> case can speed up quicksort on float arrays by 5-10%, but only if NaNs
>>> are moved out of the way first so that comparisons become cheaper [3].
>>> This has consequences for the selection algorithms that I haven't
>>> fully worked out yet.
>>>
>>
>>
> I expect the gains here would be for small sorts, which tend to be
> dominated by call overhead.
>
>
>> Even if we stick with selection sort, we should spin it off into an
>> inline smallsort function within the npy_sort library, and have quicksort
>> and mergesort call the same function, instead of each implementing their
>> own. It would make optimizations like the sorting networks easier to
>> implement for all sorts. We could even expose it outside npy_sort, as there
>> are a few places around the code base that have ad-hoc implementations of
>> sorting.
>>
>
> Good idea, I've thought of doing it myself.
>
>
>>
>>> * Using Cilk Plus to parallelize merge sort can make it significantly
>>> faster than quicksort at the expense of only a few lines of code, but
>>> I haven't checked whether Cilk Plus plays nicely with multiprocessing
>>> and other parallelism options (remember the trouble with OpenMP-ified
>>> OpenBLAS?).
>>>
>>> This isn't really an answer to your questions, more like a brain dump
>>> from someone who's stared at the same code for a while and did some
>>> experiments. I'm not saying we should implement all of this, but keep
>>> in mind that there are some interesting options besides implementing
>>> timsort.
>>>
>>
>> Timsort came up in a discussion several months ago, where I proposed
>> adding a mergesorted function (which I have mostly ready, by the way, [3])
>> to speed-up some operations in arraysetops. I have serious doubts that it
>> will perform comparably to the other sorts unless comparisons are terribly
>> expensive, which they typically aren't in numpy, but it has been an
>> interesting learning exercise so far, and I don't mind taking it all the
>> way.
>>
>> Most of my proposed original changes do not affect the core sorting
>> functionality, just the infrastructure around it. But if we agree that
>> sorting has potential for being an actively developed part of the code
>> base, then cleaning up its surroundings for clarity makes sense, so I'm
>> taking your brain dump as an aye for my proposal. ;-)
>>
>
> I have a generic quicksort with standard interface sitting around
> somewhere in an ancient branch. Sorting objects needs to be sensitive to
> comparison exceptions, which is something to keep in mind. I'd also like to
> push the GIL release back down into the interface functions where it used
> to be, but that isn't a priority. Another other possibility I've toyed with
> is adding a step for sorting non-contiguous arrays, but the sort functions
> being part of the dtype complicates that for compatibility reasons. I
> suppose that could be handled with interface functions. I think the
> prototypes should also be regularized.
>
> Cleaning up the sorting dispatch to use just one function and avoid the
> global would be good, the current code is excessively ugly. That cleanup,
> together with a generic quicksort, would be a good place to start.
>

Let the fun begin then.. I have just sent PR #5458, in case anyone wants to
take a look.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150116/43ffc480/attachment.html>


More information about the NumPy-Discussion mailing list