Q: sort's key and cmp parameters

Tue Oct 6 16:09:56 EDT 2009

Paul Rubin:

> bearophile:
>> I am having a very hard type (despite my implementation, explanations,
>> etc) explaining to D programmers why for a flexible general-purpose
>> function a key function argument is often better. They in the end have
>> added something like that in the std lib, but with a weird name like
>> SchwartzSort that's surely not the first standard sort they look
>> for... this is bad.

> "key" and the DSU (Schwartz) sorting pattern makes lots of sense in
> scripting languages like Python that are primarily used on relatively
> small data sets.  The best reason for writing something in a lower
> level language like D is because you want to push the limits of your
> availiable machine resources.  Therefore, the DSU strategy's extra
> memory consumption will be what you want less of the time than it is
> in Python.

I think you are quite wrong.

In D dynamic arrays are built-in, they are used almost as Python lists
(but they are single-typed, unless you create an array of variants),
among other things they have a built-in sort. Such sort is used for
small situations too. Even in a language like D *very* often what you
need is to sort a small amount of data in a very flexible way. In such
situations what you need isn't to squeeze the last byte or CPU cycle
out of the PC, but to have a handy and short syntax, a very flexible
sorting, and something that's surely not bug-prone. In such situation
having a 'key' argument is *better*. Such sort can be stable. That's
why the main sort/sorted routines in my dlibs accept an optional key
callable, and they are very handy. I can show you examples.

On the other hand once in a while in a D program you may need to sort
a very large amount of data, or you may need something faster (and
unstable, like a refined introsort based on a 2-pivot QS), or even
more special than the things allowed by the built in sort. In such
situation you can import a special sorting function from the std lib
and use it, such sort can have something equivalent to a cmp (but
lower level, it can accept a function template alias, to inline the
comparison operation).

So the idea is: in a multi-level language you very often have to sort
a limited amount of data in complex ways. Because in most programs a
quite large percentage of the code isn't performance-critical. In such
large parts of the code what you want is something very handy, safe
and flexible. So having a key function is positive. In the few spots
where you need high performance, you can import (or even implement
with your hands) and use a specialized sorting routine. D is a general
purpose language, it's not used like C in Python projects to write
performance-critical spots only.

This is also visible in other parts of the language, for example there
are built-in associative arrays. Their syntax is handy, and even if
they aren't high performance (they are sometimes slower than Python
dicts despite being O(1), but they never have a O(n^2) performance
profile, as may happen to Python dicts, so they are safer) you can use
them in a very quick way, even if you have to create a 10-items
associative array. The end result is that in D programs you can find
more hashes than in C++ code, where I think programmers use them only
where simpler solutions (like iterating on an array) aren't good
enough. (So D AAs have to be fast even in situations where you put 8
items inside them, like in Python dicts). This free usage of AAs makes
the code stronger too (because once in a while you may put 1000 items
in such AA, and it will keeps working efficiently, while a linear scan
in a list of 1000 items starts to become not efficient).

Bye,
bearophile