new sorting algorithm

Mon May 2 09:09:48 EDT 2022

On 5/1/2022 7:45 PM, Chris Angelico wrote:
> On Mon, 2 May 2022 at 09:20, Dan Stromberg <drsalists at gmail.com> wrote:
>>
>>
>> On Sun, May 1, 2022 at 1:44 PM Chris Angelico <rosuav at gmail.com> wrote:
>>>
>>> On Mon, 2 May 2022 at 06:43, Dan Stromberg <drsalists at gmail.com> wrote:
>>>> On Sun, May 1, 2022 at 11:10 AM Chris Angelico <rosuav at gmail.com> wrote:
>>>>>
>>>>> On Mon, 2 May 2022 at 01:53, Nas Bayedil <nbayedil at gmail.com> wrote:
>>>>>> We believe that using this method to develop completely new, fast
>>>>>> algorithms, approaching the speed of the famous *QuickSort*, the speed of
>>>>>> which cannot be surpassed, but its drawback can be circumvented, in the
>>>>>> sense of stack overflow, on some data.
>>>>>
>>>>> Hmm, actually TimSort *does* exceed the speed of quicksort for a lot
>>>>> of real-world data. For instance, if you take a large sorted list,
>>>>> append a handful of (unsorted) items to it, and then sort the list,
>>>>> TimSort can take advantage of the fact that the bulk of the list is
>>>>> sorted. It ends up significantly faster than re-sorting the entire
>>>>> list.
>>>>
>>>>
>>>> In fact, Timsort is O(n) for already-sorted data, while many quicksorts are O(n^2) for already-sorted data.
>>>>
>>>> Quicksort can be salvaged by using a median-of-3 partitioning, but it's still O(n^2) in the (less common) worst case.
>>>>
>>>
>>> This is true, but purely sorted data isn't a very practical case. The
>>> case of mostly-sorted data IS practical, though, so it's a quite big
>>> win that it can be close to O(n), and still faster than inserting each
>>> item individually.
>>
>>
>> You seem to be of the impression that nearly-sorted data isn't an uphill battle with a straightforward quicksort.
>>
>> I'm having a hard time convincing myself of that.
>>
> 
> The median-of-three partitioning technique makes that work reasonably
> well, so it won't be pathologically slow. It's hardly Quicksort's best
> feature, but it could easily be a lot worse. I'd have to check, but I
> think it still manages to be O(n log n). Merge sort, of course, is a
> lot more consistent, but the asymptotic cost is still broadly the
> same.
> 
> But Timsort manages to be close to O(n) for sorted data, reversed
> data, nearly-sorted or nearly-reversed data, etc. Makes it very handy
> for jobs like "click a heading to sort by it", where you might add
> multiple sort keys.
> 
> (Plus, Python's implementation has some cool tricks for small
> collections that make it quite efficient.)
> 
> ChrisA

Some versions of Quicksort switch over to Straight Insertion Sort when 
the partitions become small enough. The correct size will vary depending 
on the hardware.

I have not kept up with the latest improvements and I am not familiar 
with TimSort.  However Heapsort  should always be O(n log n) and there 
are modifications to Heapsort that can make it faster than vanilla 
Heapsort and closer to the speed of Quicksort.