new sorting algorithm

Chris Angelico rosuav at gmail.com
Sun May 1 19:45:32 EDT 2022


On Mon, 2 May 2022 at 09:20, Dan Stromberg <drsalists at gmail.com> wrote:
>
>
> On Sun, May 1, 2022 at 1:44 PM Chris Angelico <rosuav at gmail.com> wrote:
>>
>> On Mon, 2 May 2022 at 06:43, Dan Stromberg <drsalists at gmail.com> wrote:
>> > On Sun, May 1, 2022 at 11:10 AM Chris Angelico <rosuav at gmail.com> wrote:
>> >>
>> >> On Mon, 2 May 2022 at 01:53, Nas Bayedil <nbayedil at gmail.com> wrote:
>> >> > We believe that using this method to develop completely new, fast
>> >> > algorithms, approaching the speed of the famous *QuickSort*, the speed of
>> >> > which cannot be surpassed, but its drawback can be circumvented, in the
>> >> > sense of stack overflow, on some data.
>> >>
>> >> Hmm, actually TimSort *does* exceed the speed of quicksort for a lot
>> >> of real-world data. For instance, if you take a large sorted list,
>> >> append a handful of (unsorted) items to it, and then sort the list,
>> >> TimSort can take advantage of the fact that the bulk of the list is
>> >> sorted. It ends up significantly faster than re-sorting the entire
>> >> list.
>> >
>> >
>> > In fact, Timsort is O(n) for already-sorted data, while many quicksorts are O(n^2) for already-sorted data.
>> >
>> > Quicksort can be salvaged by using a median-of-3 partitioning, but it's still O(n^2) in the (less common) worst case.
>> >
>>
>> This is true, but purely sorted data isn't a very practical case. The
>> case of mostly-sorted data IS practical, though, so it's a quite big
>> win that it can be close to O(n), and still faster than inserting each
>> item individually.
>
>
> You seem to be of the impression that nearly-sorted data isn't an uphill battle with a straightforward quicksort.
>
> I'm having a hard time convincing myself of that.
>

The median-of-three partitioning technique makes that work reasonably
well, so it won't be pathologically slow. It's hardly Quicksort's best
feature, but it could easily be a lot worse. I'd have to check, but I
think it still manages to be O(n log n). Merge sort, of course, is a
lot more consistent, but the asymptotic cost is still broadly the
same.

But Timsort manages to be close to O(n) for sorted data, reversed
data, nearly-sorted or nearly-reversed data, etc. Makes it very handy
for jobs like "click a heading to sort by it", where you might add
multiple sort keys.

(Plus, Python's implementation has some cool tricks for small
collections that make it quite efficient.)

ChrisA


More information about the Python-list mailing list