Python 3.0, rich comparisons and sorting order

Thu Sep 23 08:28:03 EDT 2004

Carlos Ribeiro <carribeiro at gmail.com> wrote:
   ...
> > You could also do that passing a function to sort(). 
>  
> I know it, and that's exactly my question -- if this is going to be
> the way to do it in Python 3000. Today (2.4) there are FOUR ways to
> make it work: passing a compare function as an argument to sort(),
> passing a key funciton as an argument to sort(), implementing a
> __cmp__ function, or implementing the rich comparison methods. If the

What you can do with key= and with cmp= are not exactly the same things.
key= normally allows WAY faster sorts, when applicable, thanks to
decorate-sort-undecorate... but that only really holds when the key is a
simple structure of elementary types using builtin comparisons.
Besides, speed isn't all: sometimes extracting a key is conceptually
simpler, sometimes it's simpler to express the desired comparison in
operational terms.  Speed isn't everything: simplicity matters, too.

A simplification that I think can be made is to have sorting hinge
strictly on __lt__, the < operator.  That will, however, no doubt still
leave indirect avenues open.  For example, it would be somewhat
surprising if object didn't have a default __lt__ delegating to __cmp__,
so it would still be possible, in your own types, to define either
__lt__, or __cmp__.  Much like the distinctions between __add__,
__iadd__, __radd__, I suspect these conveniences WILL remain.

> main goal of Py3K  is to have only one obvious way to do it, what one

_Preferably_ only one, just like it's always been the case in Python.

But practicality beats purity.  You will still be allowed, I predict, to
sum two numbers by a+b, b+a, sum([a,b]), operator.add(a,b), and other
ways yet; trying to forbid some of these would be impractical.

> is it going to be? I think that the best way should be the simplest
> one: make sort() work whatever is passed to it in a reasonably way,

Errors should not pass silently, unless explicitly silenced.  MOST of
the time, a poor hapless user trying to compare 23 with "42", just like
one trying to sum these objects, is making a mistake.  Handing such
likely errors as if they weren't errors is NOT doing the poor hapless
user good service.  Thus, I find it quite reasonable that the _default_
behavior of sort be that desired well over 90% of the time, for
comparisons happening within sort just as well as for others: raise an
exception if I'm trying to compare strings with numbers (and the like).
For that less-than-10% of the time where I really _want_ to compare
_anything_, passing some kind of explicit indicator such as a keyword
parameter is just fine.  And I think the right way to do it is to have
an "universal_lt" off in a module somewhere and pass THAT when required.

It's hard to gauge exactly how frequently I need universal comparisons
(rather than type-specific ones), but I'd guess _maybe_ 5%, if that.
Sacrificing error detection in 95% of the cases for the sake of removing
the need to be explicit about the other 5% of the cases would be a very
grave design error, in my opinion.  Making that "other 5%" any more
problematic for the user than passing a simple keyword parameter or the
like would also be a mistake, but perhaps one of lesser importance than
what I think you're proposing -- universal comparisons _by default_.

> and decide to have one preferred way to extend its behavior.

A key= returning a simple structure (tuple or list), made up of
elementary items to compare 'naturally' by built-in means, is the
preferred way _when applicable_, for speed and simplicity, but it's not
always so applicable; therefore, sometimes other ways to tweak (not
necessarily 'extend'!) sorting will be better.

Asking for "one preferred way" to apply ALL of the times is like asking
for "one preferred arithmetic operator" -- can't you see it WILL depend
on what it IS that you're trying to do?!  If I had to nominate one
operator as "preferred" it would be addition -- most frequently, it
serves your typical purposes best -- but it's not all THAT infrequent to
meet a problem that's best solved by dividing instead, is it?

> > Define "reasonable result" with different datatypes. 
> 
> Now that's a harder question :-) Any ordering for multiple types is
> going to be arbitrary -- but it still may be considered reasonable if
> it reflects our common sense when it comes to ordering. In some cases,

That's probably too difficult, since there is no guarantee that our
common sense (were it to be determined) would be mathematically
consistent.  I can see how "by common sense" one might want a list made
of characters to compare consistently with other sequences made of
characters, such as tuples, strings, arrays, for example -- but it can't
be mathematically consistent with other "common sense" ideas about
comparing other groups of sequences.

"Common sense" would tell most people that 23<'42'<99, but that is
another example of where consistency probably can't be reached.  Etc,
etc.

Alex