Surprising (for me) benchmark results...

Tue May 1 23:58:26 EDT 2001

[Daniel Berlin]
> It [Python's list.sort()] doesn't beat a very well engineered
> radix sort, or even multikey quicksort (a merger of quicksort
> and radix sort, basically), for sorting strings.

Python's list.sort(), like C's qsort(), is constrained to indirect thru a
three-outcome cmp(PyObject* x, PyObject* y) function.  The sort itself has no
access to "the bits" ("the keys", "the guts", however you want to look at
it), although the comparison function may.  The sort routine makes no
assumptions about the comparison function, or the types of the objects in the
list (heterogeneous lists are fine -- mixing strings and tuples and lists and
floats and class instances etc); in general, the cmp function used in
Python-- and every time it's called --does double-dispatch on the argument
types, may do dynamic coercions, and can execute arbitrary Python and C code
to arrive at a result.  As a side effect, the comparison function may even
change the types of the objects being compared.  If you can write a radix
sort based solely on that kind of cmp(), you're the wizard we've been looking
for <wink>.

> I tested it.  Though you are talking about the difference between
> 1 second and .25 second, on *very* large numbers of strings.

Make the strings very short, then, *or* make a million strings with a common
100-character prefix:  the more a specialized sort can exploit that it's
looking at strings and only strings, the higher the relative burden of the
Python sort's generality.

all-purpose-vs-one-purpose-ly y'rs  - tim