Incomparable abominations (was: python-dev Summary)

Wed Mar 19 14:12:53 EST 2003

I don't usually read Python-Dev, although I suppose I should more (time,
time, time).  The latest summary prompted a look, which made me notice
Alex Martelli's well-founded complaint about the breakage in sorting
list with complex numbers in them:

    http://mail.python.org/pipermail/python-dev/2003-March/034065.html

Guido subsequently defended the erratic and unintuitive behavior of
Python comparisons.  Andrew Koenig made an excellent observation in the
thread:

  In the first case, we have a total ordering; in the second, we have
  what C++ calls a "strict weak ordering", which is really an ordering
  of equivalence classes.

I completely DO NOT BUY Guido's point in the thread.  The failure of
.sort() depending on obscure combinations of list contents is, to my
mind, the BIGGEST WART that Python has.

The argument is sometimes made that it is a programming mistake to try
to compare items of incomensurable types.  But current Python is very
happy to compare a string to a number, for example.  It would take a
radical overhaul to get rid of all the arbitrary (but stable)
comparisons.  In any case, when I see .sort(), I really want a (stable)
total ordering, even understanding that the ordering is arbitrary across
types.  It is quite sensible, and not uncommon, to collect various
types of things within dict.keys(), for example.

Moreover, from my perspective, the same applies to incommensurabilities
involving Unicode strings.  I've had it explained to me before that I
should believe this doesn't really count because I need to specify an
encoding.  Nonsense!  I want the ability to perform a total ordering on
Python objects, at least within a sort.

Here is some erratic behavior (most of it introduced, BTW, in *Python
2.1*; even 2.0 is OK with complex numbers).  Try to guess when an
exception will be raised (and in which Python version):

    ['x','y','z'].sort()
    ['x','y','z', 1j].sort()
    ['x','y','z', 1j, 1].sort()
    ['x','y','z', 1, 2].sort()
    [0j, 1j, 2j].sort()                 # An obvious "natural" order
    [0j, 1, 2].sort()
    [0, 1, 2].sort()                    # Notice that 0==0j --> True
    [chr(120), chr(240)].sort()
    [chr(120), chr(240), 'x'].sort()
    [chr(120), chr(240), u'x'].sort()   # Notice u'x'=='x' --> True
    [chr(120), u'x'].sort()
    [chr(120), 'x', u'x'].sort()
    [chr(120), 'x', u'x', 1j].sort()

A custom sorting function can handle these inconsistencies--very slowly.
What would be MUCH nicer would be to have a Python 2.0-style
".stablesort()" method (almost, Unicode "broke" then already).  Ideally,
this method would do something -sensible- within types, and something
-consistent- between types (at least consistent within one run of the
interpreter, not necessarily any more than that).

Yours, David...