Would there be support for a more general cmp/cmp

Tue Oct 25 10:09:29 EDT 2005

Antoon Pardon wrote:

> I also think there is the problem that people aren't used to partial
> ordering. There is an ordering over sets, it is just not a total
> ordering. But that some pairs are uncomparable (meaning that neither
> one is smaller or greater) doesn't imply that comparing them is
> ill defined.

It depends on your definition of "comparison."  Admittedly, <, =, !=, 
and > can be defined for a partial ordering, but classical mathematics, 
as understood by most people (even programmers), assumes that unless a 
== b, a > b or a < b.

The comparisons, as defined this way, are done on a totally ordered set, 
yes.  But since most comparisons ever done -are- done on a totally 
ordered set (namely the integers or reals), it's entirely fair to say 
that "a programmer's expectation" is that comparisons should work more 
or less like a totally ordered list.

With that caevat in mind, comparison on a partially ordered domain /is/ 
ill-defined; it can give inconsistent results from the "a < b or a > b 
or a == b" rule.

> 
> Well it is a wrong assumption is general. There is nothing impure about
> partial ordering. 

Impure? Probably not.  Useless from many perspective when "comparisons" 
are needed, to the point where it's probably safer to throw an exception 
than define results.

> That is IMO irrelevant. The subset relationship is an ordering and as
> such has all characteristics of other orderings like "less than",
> except that the subset relationship is partial and the less than
> relationship is total. How it is called "subset" vs "less than" is
> IMO irrelevant. It is about mathematical characteristics.

Still accident.  < wouldn't be used for sets if we had a subset symbol 
on the standard keyboard, APL fans notwithstanding.  Simce programmers 
familiar with Python understand that < on sets isn't a "real" comparison 
(i.e. provide a total order), they don't expect unique, consistent 
results from something like sort (or a tree, for that matter).

> 
> 
>>By analogy, one can ask, "is the cat inside the box?" and get the answer
>>"No", but this does not imply that therefore the box must be inside the
>>cat.
> 
> 
> Bad analogy, this doesn't define a mathematical ordering, the subset
> relationship does.

Yes, it does.  Consider "in" as a mathematical operator:

For the set (box, cat-in-box)

box in box: False
box in cat-in-box: False
cat-in-box in box: True
cat-in-box in cat-in-box: False

For the set (box, smart-cat) # cat's too smart to get in the box

box in box: False
box in smart-cat: False
smart-cat in box: False
smart-cat in smart-cat: False

In both these cases, the "in" operator is irreflexive, asymmetric, and 
transitive (extend to mouse-in-cat if you really care about transitive), 
so "in" is a partial order sans equality.  A useless one, but a partial 
order nonetheless.

>>Notice that L1 and L2 contain the same elements, just in different orders.
>>Sorting those two lists should give the same order, correct?
> 
> 
> No. Since we don't have a total ordering, sorting doesn't has to give
> the same result. For as far as sorting is defined on this kind of
> object, the only thing I would expect is that after a sort the
> following condition holds.
> 
> for all i,j if i < j then not L[i] > L[j]

Highly formal, aren't you?  Again, common programming allows for "not a 
  > b" == "a <= b", so your condition becomes the more familiar:

for all i,j in len(list), if i < j then L[i] <= L[j]

This condition is also assumed implicitly by many other assumed "rules" 
of sorted lists, namely their uniqueness:

"If L is a list of unique keys, then sort(L) is a unique list of keys in 
sorted order."

Yes, this assumes that L has a total order.  Big whoop -- this is a 
matter of programming practiciality rather than mathematical purity.

>>Personally, I argue that sorting is something that you do to lists, and
>>that all lists should be sortable regardless of whether the objects within
>>them have a total order or not. If not, the list should impose a sensible
>>ordering on them, such as (perhaps) lexicographical order ("dictionary
>>order").

To reply to the grandparent poster here: that's not always easy to do. 
A "sensible order" isn't always easy to define on a set where a partial 
order exists, if it includes that order already.  Adding 
comparison-pairs to a partially ordered set (where incomparable elements 
throw an exception, rather than just return False which is confusing 
from sort's perspective) can easily result in a graph that isn't 
actually transitive and antisymmetric, i.e.:

A > B > C, but
C > D > A for some operator ">"

With this in mind, the only sensible thing for .sort to do when it 
encounters an exception is to fall back to its "backup" comparator (id, 
for example), and resort /the entire list/ using that comparator.  The 
results returned will then be valid by sort's comparison, but a subset 
of that list containing only "good" objects (like integers) may not (and 
probably won't be) sorted itself using the object's comparison.

The difficulty arises because while we may consider a single type to be 
fully ordered, it may also have some comparisons with other types; 
sorting the list by type, then value seems like it would work (and 
that's what the library does), but since we can sometimes compare across 
types, this may not give a valid comparison.  Consider:

[apple, wheat, cow]
apple < cow (defined, totally ordered)
Mammal > Grain > Fruit (accident of type id numbers)
apple < wheat throws TypeError, also wheat < cow

Since we encounter a TypeError when sorting this list, we first sort by 
type:
[cow, wheat, apple]
and would then sort-by-comparison within types, if we had more than one.

But since apple < cow, the sublist of this list [cow, apple] is not 
itself sorted.  This is paradoxical under a single comparison, because a 
sublist of any sorted list is itself supposed to be sorted.

The ordering that .sort used on the full list is well-defined and total, 
but it's also useless.

> But will all libraries sort the shelves in the same way. As far as I
> understand a book gets somekind of code in a library and books are
> sorted according to these codes. Now everybody sorts these codes the
> same way, but the attibution of the code may differ from library to
> library.

Irrelevant; if you swap the order of any two (nonidentical) books in a 
single library, then the library is unsorted.  The library has imposed a 
total order on books in its collection.

The library example is irrelevant, anyway; two books can always be 
compared by the keys (author last, author first/middle, [other authors], 
title, editor, year published, publisher, number of pages, colour of the 
brightest line in the cover's emmission spectrum at 2000k in spectral 
order [within visible spectrum]), and for the most part this is a useful 
keyspace.   Libraries just choose to impose a category/numbering at the 
head of the key.

Would there be support for a more general cmp/__cmp__

Would there be support for a more general cmp/cmp