Q: sort's key and cmp parameters

Bearophile bearophileHUGS at lycos.com
Wed Oct 7 10:41:58 EDT 2009


Hrvoje Niksic:

> Note that stable sort has additional memory requirements.  In situations
> where you don't need stability, but do need memory-efficient in-place
> sorting, an unstable sort might well be preferred.  This is why
> libraries such as C++'s STL offer both.

There are stable sorts that need no additional memory.

In a largish program written in a general-purpose multi-level
language, like D (this is probably true for C++ too), often 80% of the
code takes a very little time to run. Such code may need to process
few small arrays, or small data sets, or to manage the GUI, etc.

So optimizing those parts of the code for performance (both in memory
used and CPU cycles used) is stupid. What you want in such large parts
of the code is:
- to write code quickly
- to have code that's short, readable, easy to fix, and most important
of all that is the less bug-prone as possible.

So what you need in such large parts of the code is something that's
very flexible and safe, even if it's not top performance (both in
memory and CPU).

This is why for example in D there are built-in associative arrays,
that have a handy syntax, built-in methods, and they are never O(n^2),
even if they are not the most efficient ones where you need max
performance, or where you need minimal memory used, or where you need
to perform unusual operations. Such built-ins are useful to reduce
both code length and bug count, because they are quite safe and easy
to use.

Stable sorts are a little safer, because they don't scramble your data
in certain ways, so they can be used in more situations. This is why
having the Timsort in Python is good.

When you run profile the D code and you see your program uses too much
RAM or wastes too much time in the built-in sort, then you can switch
to using special sorts from the std lib. You can even write your own
hash or sort for special situations (and you don't have to drop down
to use another language for that, you keep using the same, even if you
may want to change your programming stile, and use a more C-like
style, with no dynamic allocations, some bit twidding, pointers, more
structs, 16 bit integers, unsigned integers, unions, compiler
intrinsics, even inlined assembly that uses SSE3, etc). In normal code
you may want to adopt a more Java-like programming style, or
templates, or even using a large library I have written, you may
program it almost as a curious Python-like, with lazyness too.

This is why I think the built-in sort has to be flexible, because you
are supposed to use it most of the times, and most of the times you
don't need top performance. Different parts of a program have to be
optimized for different things, computer performance, or programmer
performance.

Bye,
bearophile



More information about the Python-list mailing list