Python 3.0, rich comparisons and sorting order

Carlos Ribeiro carribeiro at gmail.com
Wed Sep 22 08:21:29 EDT 2004


On Wed, 22 Sep 2004 11:03:51 +0200, Alex Martelli <aleaxit at yahoo.com> wrote:
> The fact that some such problems were observed in Python itself was used
> as an argument to justify not doing comparisons among het types in
> Python; I argue that pushing such subtle responsibilities down to Python
> _users_ is no progress.

That pretty much sums my argument very well (Thanks Alex!). I think
that sort must simply work. *If* some type of ordering between
heterogeneous items is defined by the Python sort() behavior, then
it's much easier for any user (novice or expert) to simply refer to
that ordering when trying to understand how does sort() works. Look at
what happens if this problem is moved to the user side:

-- extra care will be needed by the part of the programmer, because
sort() can raise TypeErrors in situations where it doesn't raise today
(even a *very simple case*, that is to have None values in the list,
as the ones returned by SQL queries);

-- worse performance, because a generic __cmp__ function written is
Python is bound to be orders of magnitude slower than the native
implementation;

-- in the lack of a standard, default ordering, each and every
programmer will define its own ordergin when it comes to managing
heterogeneous data. If the overloaded function is exported, the user
will be left to guess about the ordering. In a way, it violates the
TOOWTDI motto.

Talking about real applications:

-- SQL applications can return null values for non-initialized values.
Try ordering that.

-- OLE applications that return variants are bound to the same
problem. They may return null values, or any other type, and Python
will convert the type automatically. If these values are fed to a list
and then sorted, again, there may be problems.

-- The long tuples mentioned by Alex can be simply long strings.
Internationalization text, for instance -- to compute the hash, one
must iterate over the entire string; a balanced tree-based design will
compare only the first few characters in most cases. Hashing strings
is probably fast (a very tight loop, I assum), but for some size of
string is bound to start to get slower than binary comparisons
mentioned.

-- A heap can't be used anymore to treat heterogeneous objects.

And finally:

I think that many people are worried about what are really corner
cases -- sorting complex numbers or other complex data structures.
That's not what worries me, because in this situation we can safely
assume that the application is complex enough to deserve sort()
customization. *What I'm worried about is the lack of default ordering
for fundamental types*. Another issue that was raised is the ordering
between strings and numbers. Well, this particular case can be
discussed. As it is today, it's not that bad -- some people may forget
to convert numbers to strings, or be confused by the final ordering,
but at least *there's a standard ordering* that works for everyone
else.

In conclusion (I'm repeating myself again):

-- ordering, for sort purposes, is one thing -- it's *information
management* at work..
-- mathemathical ordering is another thing.

The two concepts are *very similar*, but are still *different*, as far
as their real applications are concerned. In the case of sorting, what
matters is to have a unique and well defined ordering that can be used
for *information management* purposes. On the other hand, rich
comparisons are already being used for other purposes, valid in the
mathematical sense, that have nothing to do with ordering. Is it or
not clear?


p.s. It's good to have this discussion so far in advance to Py3K. This
way we can avoid having it at Py3K alpha, with very little time and a
small chance to win the argument against a commited patch.

-- 
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: carribeiro at gmail.com
mail: carribeiro at yahoo.com



More information about the Python-list mailing list