Would there be support for a more general cmp/cmp

Mon Oct 24 09:14:52 EDT 2005

Op 2005-10-24, Steven D'Aprano schreef <steve at REMOVETHIScyber.com.au>:
> On Mon, 24 Oct 2005 09:23:58 +0000, Antoon Pardon wrote:
>
>> Op 2005-10-21, Christopher Subich schreef <csubich.spam.block at spam.subich.block.com>:
>>> Antoon Pardon wrote:
>>>> It would be better if cmp would give an indication it
>>>> can't compare two objects instead of giving incorrect
>>>> and inconsistent results.
>>>
>>> If two objects aren't totally comparable, then using 'cmp' on them is 
>>> ill-defined to begin with.
>> 
>> Where is that documented?
>
> Surely that is so obvious, it doesn't need documenting? If comparisons
> aren't defined between two things ("which is longer, water or fire?") then
> comparing them is ill-defined.
>
> On second thoughts... no, I don't believe it is so obvious. People are
> used to thinking about comparisons in the context of numbers and strings,
> and it won't hurt to remind them that not every pair of objects can be
> compared.

I also think there is the problem that people aren't used to partial
ordering. There is an ordering over sets, it is just not a total
ordering. But that some pairs are uncomparable (meaning that neither
one is smaller or greater) doesn't imply that comparing them is
ill defined.

>>> The Standard Thing To Do is throw an 
>>> exception; see the Highly Obscure Case of the Complex Numbers.
>>>
>>> >>>1 == 1j
>>> False
>>> >>>1 != 1j
>>> True
>>> >>>1 < 1j
>>> Traceback (most recent call last):
>>>    File "<stdin>", line 1, in ?
>>> TypeError: cannot compare complex numbers using <, <=, >, >=
>> 
>> I would say this is the wrong answer. The right answer should
>> be False IMO.
>
> I'm afraid that mathematically you are wrong. Greater and less than is not
> defined for complex numbers.
>
> I think I see where you are coming from: you are reading "1 < 1j" as "is
> one less than one*j", the answer to which is "no". That's a sensible
> answer to a sensible question, except for two problems:
>
> (1) you will upset the mathematical purists, who argue that the English
> sentence is not quite the same as the mathematical relation; and
>
> (2) you will upset the programmers, since your suggestion would create a
> whole class of gotchas and bugs, where people wrongly assume that since
> 1<1j is False, 1>1j should be True.
>
> This is one case where practicality *is* purity.

Well it is a wrong assumption is general. There is nothing impure about
partial ordering. 

>> Look at sets:
>> 
>>>>> set([1]) <=  set([2])
>> False
>>>>> set([2]) <= set([1])
>> False
>
> No, sorry, you are making a mistake. In the context of sets, the <
> operator doesn't mean "less than", it means "is a subset of". It is
> correct to say neither "set([1]) is a subset of set([2])" nor vice versa
> is true. Both are false.

That is IMO irrelevant. The subset relationship is an ordering and as
such has all characteristics of other orderings like "less than",
except that the subset relationship is partial and the less than
relationship is total. How it is called "subset" vs "less than" is
IMO irrelevant. It is about mathematical characteristics.

> By analogy, one can ask, "is the cat inside the box?" and get the answer
> "No", but this does not imply that therefore the box must be inside the
> cat.

Bad analogy, this doesn't define a mathematical ordering, the subset
relationship does.

>> 
>>>>> set([1]) <= set([1,2])
>> True
>>>>> cmp(set([1]), set([1,2]))
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in ?
>> TypeError: cannot compare sets using cmp()
>> 
>> The first set is smaller than the second so acoording to the
>> documentation, I should get a negative number as a result.
>
> No, the first expression tests whether set([1]) is a subset of set([1,2]),
> not whether it is "smaller".

This is quibling with words. Both test whether one comes before the
other is a certain order.

> Smaller and bigger aren't defined for sets.

It is not about smaller or bigger, it is about the general concept
of (mathematical) order. Whether you call this order, subset or
less than is not important. 

>>> then your
>>> _ONLY_ choices, no matter your implementation, is to return some
>>> possibly inconsistent result (a < b == 1, b < c == 1, a < c == 0) or
>>> raise an exception for unapplicable comparisons.
>>>
>>> This isn't a Python problem; this is a problem from formal mathematics.
>> 
>> I think it is python's problem if python gives the wrong results.
>> 
>> when a < b then cmp(a,b) should give a negative number as a result.
>> Python sometimes does not!
>> 
>> when not a < b then cmp(a,b) should not give a negative number as a
>> result. Python sometimes does!
>> 
>> I think those are python problems.
>
> I think you need to give examples of where < or > *defined as comparisons*
> returns the wrong results. The examples from sets do not count, because
> the operators aren't being used for comparisons.

IMO they count, because the subset relationship is a mathematical order
relation just as less than is one.

>>> Personally, I'd favor the "raise an exception" case, which is exactly
>>> what will happen if you define rich comparisons and let cmp throw an
>>> exception.
>> 
>> But I don't want cmp to raise an exception when the two elements are
>> comparable.
>> 
>> That cmp(set([1]), set([2])) would raise an exception, I can understand,
>> although I would prefer it returned None, but why should cmp(set([1]),
>> set([1,2])) raise an exception. There is a perfectly well defined answer
>> for this.
>
> Alas, there is not. First you have to define what "less than" and "greater
> than" means for sets, and there is no well-defined answer to that. They
> certainly don't mean "is a subset" and "is a superset". Nor is it sensible
> to define them in terms of the number of elements:

Look I have to define this for whatever class of objects I want to use
anyway. On what grounds do you argue that using partial orderings
shouldn't work.

Wether you want to call being a subset "less than" or not is not the
issue. If I need it for my program I can certainly define a set class
where "less than" will be like the subset relationship or I can define
other classes where an ordering exists but not a total one.

IMO there is nothing wrong with using the order symbols python already
has, to use ordering on other objects..

> def lt(set1, set2):
>     """Defines less than for sets in terms of the number of
>     elements.""" 
>     return len(set1) < len(set2)
>
> def gt(set1, set2):
>     """Defines greater than for sets in terms of the number of
>     elements."""
>     return len(set1) > len(set2)
>
> py> lt( set([1]), set([1, 2]) )
> True
> py> gt( set([1]), set([1, 2]) )
> False
>
> So far so good. But now:
>
> py> lt( set([1]), set([2]) )
> False
> py> gt( set([1]), set([2]) )
> False
> py> set([1]) == set([2])
> False
>
> So set([1]) is neither less than, nor greater than, nor equal to set([2]).

So? It seems you never have encounted a partial ordering before. What
you have illustrated here is not special among (partial) orderings.
And the world is full of partial orderings. I think it is wrong to
shoehorn everything in a total ordering and that is why I think
python should give programmers a way in handling partial orderings.

>>> Operators that assume comparable objects, like sort, also almost always
>>> assume a total order -- inconsistent operations can give weird results,
>>> while throwing an exception will at least (usually) give some sort of
>>> error that can be caught.
>> 
>> No it wont. the sort method will happily sort a list of sets.
>> 
>>>>> lst
>> [set([1]), set([2])]
>>>>> lst.sort()
>
> That's an accident of implementation. Sorting nows uses only <, rather
> than __cmp__, and since < (subset) for sets is defined, sort blindly goes
> and sorts the list.
>
> However, look at this:
>
> py> from sets import Set as set
> py> L1 = [set([1]), set([2]), set([3])]
> py> L2 = [set([2]), set([3]), set([1])]
>
> Notice that L1 and L2 contain the same elements, just in different orders.
> Sorting those two lists should give the same order, correct?

No. Since we don't have a total ordering, sorting doesn't has to give
the same result. For as far as sorting is defined on this kind of
object, the only thing I would expect is that after a sort the
following condition holds.

for all i,j if i < j then not L[i] > L[j]

> py> L1.sort()
> py> L2.sort()
> py> L1
> [Set([1]), Set([2]), Set([3])]
> py> L2
> [Set([2]), Set([3]), Set([1])]
>
> Should, but doesn't. Oops. That's a bug.
>
> Personally, I argue that sorting is something that you do to lists, and
> that all lists should be sortable regardless of whether the objects within
> them have a total order or not. If not, the list should impose a sensible
> ordering on them, such as (perhaps) lexicographical order ("dictionary
> order").
>
> Look at it this way: books do not have a total ordering. What does it mean
> to say that Tolstoy's "War and Peace" is less than Tom Clancy's "The Hunt
> For Red October"? Are we comparing by title? Author? Contents? Cost of the
> book, dimensions of the book, number of pages, some subjective idea of
> quality, average word length, maximum sentence length, publication date,
> number of protagonists, or what?
>
> And yet libraries somehow find no difficulty in sorting shelves of
> hundreds or thousands of books.

But will all libraries sort the shelves in the same way. As far as I
understand a book gets somekind of code in a library and books are
sorted according to these codes. Now everybody sorts these codes the
same way, but the attibution of the code may differ from library to
library.

-- 
Antoon Pardon

Would there be support for a more general cmp/__cmp__

Would there be support for a more general cmp/cmp