new sorting algorithm

Sun May 1 19:45:32 EDT 2022

On Mon, 2 May 2022 at 09:20, Dan Stromberg <drsalists at gmail.com> wrote:
>
>
> On Sun, May 1, 2022 at 1:44 PM Chris Angelico <rosuav at gmail.com> wrote:
>>
>> On Mon, 2 May 2022 at 06:43, Dan Stromberg <drsalists at gmail.com> wrote:
>> > On Sun, May 1, 2022 at 11:10 AM Chris Angelico <rosuav at gmail.com> wrote:
>> >>
>> >> On Mon, 2 May 2022 at 01:53, Nas Bayedil <nbayedil at gmail.com> wrote:
>> >> > We believe that using this method to develop completely new, fast
>> >> > algorithms, approaching the speed of the famous *QuickSort*, the speed of
>> >> > which cannot be surpassed, but its drawback can be circumvented, in the
>> >> > sense of stack overflow, on some data.
>> >>
>> >> Hmm, actually TimSort *does* exceed the speed of quicksort for a lot
>> >> of real-world data. For instance, if you take a large sorted list,
>> >> append a handful of (unsorted) items to it, and then sort the list,
>> >> TimSort can take advantage of the fact that the bulk of the list is
>> >> sorted. It ends up significantly faster than re-sorting the entire
>> >> list.
>> >
>> >
>> > In fact, Timsort is O(n) for already-sorted data, while many quicksorts are O(n^2) for already-sorted data.
>> >
>> > Quicksort can be salvaged by using a median-of-3 partitioning, but it's still O(n^2) in the (less common) worst case.
>> >
>>
>> This is true, but purely sorted data isn't a very practical case. The
>> case of mostly-sorted data IS practical, though, so it's a quite big
>> win that it can be close to O(n), and still faster than inserting each
>> item individually.
>
>
> You seem to be of the impression that nearly-sorted data isn't an uphill battle with a straightforward quicksort.
>
> I'm having a hard time convincing myself of that.
>

The median-of-three partitioning technique makes that work reasonably
well, so it won't be pathologically slow. It's hardly Quicksort's best
feature, but it could easily be a lot worse. I'd have to check, but I
think it still manages to be O(n log n). Merge sort, of course, is a
lot more consistent, but the asymptotic cost is still broadly the
same.

But Timsort manages to be close to O(n) for sorted data, reversed
data, nearly-sorted or nearly-reversed data, etc. Makes it very handy
for jobs like "click a heading to sort by it", where you might add
multiple sort keys.

(Plus, Python's implementation has some cool tricks for small
collections that make it quite efficient.)

ChrisA