Would there be support for a more general cmp/__cmp__

Steven D'Aprano steve at REMOVETHIScyber.com.au
Mon Oct 24 08:07:26 EDT 2005


On Mon, 24 Oct 2005 09:23:58 +0000, Antoon Pardon wrote:

> Op 2005-10-21, Christopher Subich schreef <csubich.spam.block at spam.subich.block.com>:
>> Antoon Pardon wrote:
>>> It would be better if cmp would give an indication it
>>> can't compare two objects instead of giving incorrect
>>> and inconsistent results.
>>
>> If two objects aren't totally comparable, then using 'cmp' on them is 
>> ill-defined to begin with.
> 
> Where is that documented?

Surely that is so obvious, it doesn't need documenting? If comparisons
aren't defined between two things ("which is longer, water or fire?") then
comparing them is ill-defined.

On second thoughts... no, I don't believe it is so obvious. People are
used to thinking about comparisons in the context of numbers and strings,
and it won't hurt to remind them that not every pair of objects can be
compared.

>> The Standard Thing To Do is throw an 
>> exception; see the Highly Obscure Case of the Complex Numbers.
>>
>> >>>1 == 1j
>> False
>> >>>1 != 1j
>> True
>> >>>1 < 1j
>> Traceback (most recent call last):
>>    File "<stdin>", line 1, in ?
>> TypeError: cannot compare complex numbers using <, <=, >, >=
> 
> I would say this is the wrong answer. The right answer should
> be False IMO.

I'm afraid that mathematically you are wrong. Greater and less than is not
defined for complex numbers.

I think I see where you are coming from: you are reading "1 < 1j" as "is
one less than one*j", the answer to which is "no". That's a sensible
answer to a sensible question, except for two problems:

(1) you will upset the mathematical purists, who argue that the English
sentence is not quite the same as the mathematical relation; and

(2) you will upset the programmers, since your suggestion would create a
whole class of gotchas and bugs, where people wrongly assume that since
1<1j is False, 1>1j should be True.

This is one case where practicality *is* purity.


> Especially in the following case do I think the TypeError is the wrong
> answer:
> 
>>>> 3 + 0j < 2 + 0j
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: cannot compare complex numbers using <, <=, >, >=

That's an interesting case. A mathematical purist might argue that
ordering is not defined for complex numbers, even if the imaginary
component is zero. A *different* purist might argue:

3 = 3+0j
2 = 2+0j

then since 3 > 2 it should also be true that 3+0j > 2+0j.

I can see merit in both arguments, but I'm inclined to go with the second
argument, if and only if the imaginary component is exactly zero. I take
comfort in the fact that mathematicians allow continued fractions with
zero terms even though dividing by zero is verboten.


> Look at sets:
> 
>>>> set([1]) <=  set([2])
> False
>>>> set([2]) <= set([1])
> False

No, sorry, you are making a mistake. In the context of sets, the <
operator doesn't mean "less than", it means "is a subset of". It is
correct to say neither "set([1]) is a subset of set([2])" nor vice versa
is true. Both are false.

By analogy, one can ask, "is the cat inside the box?" and get the answer
"No", but this does not imply that therefore the box must be inside the
cat.


[snip]

>> So your use-case is already well-defined, and rather simple.  Make
>> __cmp__ raise an exception of your choice,
> 
> That is not well defined. It doesn't allow to distinghuish between
> something went wrong in __cmp__ and and answer that indicates the only
> usefull thing you can say is that they are unequal.

Agreed there: there should be a distinct exception for trying to compare
incomparable objects, rather than ValueError or TypeError.


> Besides I find the following result inconsistent:
> 
>>>> set([1]) <= set([1,2])
> True
>>>> cmp(set([1]), set([1,2]))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: cannot compare sets using cmp()
> 
> The first set is smaller than the second so acoording to the
> documentation, I should get a negative number as a result.

No, the first expression tests whether set([1]) is a subset of set([1,2]),
not whether it is "smaller". Smaller and bigger aren't defined for sets.
Informally, we can say that one set is smaller than another if it has
fewer members, but that's only one possible definition of "smaller" out of
many.

Think, for example, of the set of all US military forces. That has one
member. Compare it to the set of all US presidents whose surname is or was
Bush. That set has two members. Do you think it is sensible definition of
"smaller" to say that the set of US military forces is smaller than the
set of Presidents called Bush?


[snip]

>> then your
>> _ONLY_ choices, no matter your implementation, is to return some
>> possibly inconsistent result (a < b == 1, b < c == 1, a < c == 0) or
>> raise an exception for unapplicable comparisons.
>>
>> This isn't a Python problem; this is a problem from formal mathematics.
> 
> I think it is python's problem if python gives the wrong results.
> 
> when a < b then cmp(a,b) should give a negative number as a result.
> Python sometimes does not!
> 
> when not a < b then cmp(a,b) should not give a negative number as a
> result. Python sometimes does!
> 
> I think those are python problems.

I think you need to give examples of where < or > *defined as comparisons*
returns the wrong results. The examples from sets do not count, because
the operators aren't being used for comparisons.


>> Personally, I'd favor the "raise an exception" case, which is exactly
>> what will happen if you define rich comparisons and let cmp throw an
>> exception.
> 
> But I don't want cmp to raise an exception when the two elements are
> comparable.
> 
> That cmp(set([1]), set([2])) would raise an exception, I can understand,
> although I would prefer it returned None, but why should cmp(set([1]),
> set([1,2])) raise an exception. There is a perfectly well defined answer
> for this.

Alas, there is not. First you have to define what "less than" and "greater
than" means for sets, and there is no well-defined answer to that. They
certainly don't mean "is a subset" and "is a superset". Nor is it sensible
to define them in terms of the number of elements:

def lt(set1, set2):
    """Defines less than for sets in terms of the number of
    elements.""" 
    return len(set1) < len(set2)

def gt(set1, set2):
    """Defines greater than for sets in terms of the number of
    elements."""
    return len(set1) > len(set2)

py> lt( set([1]), set([1, 2]) )
True
py> gt( set([1]), set([1, 2]) )
False

So far so good. But now:

py> lt( set([1]), set([2]) )
False
py> gt( set([1]), set([2]) )
False
py> set([1]) == set([2])
False

So set([1]) is neither less than, nor greater than, nor equal to set([2]).


>> Operators that assume comparable objects, like sort, also almost always
>> assume a total order -- inconsistent operations can give weird results,
>> while throwing an exception will at least (usually) give some sort of
>> error that can be caught.
> 
> No it wont. the sort method will happily sort a list of sets.
> 
>>>> lst
> [set([1]), set([2])]
>>>> lst.sort()

That's an accident of implementation. Sorting nows uses only <, rather
than __cmp__, and since < (subset) for sets is defined, sort blindly goes
and sorts the list.

However, look at this:

py> from sets import Set as set
py> L1 = [set([1]), set([2]), set([3])]
py> L2 = [set([2]), set([3]), set([1])]

Notice that L1 and L2 contain the same elements, just in different orders.
Sorting those two lists should give the same order, correct?

py> L1.sort()
py> L2.sort()
py> L1
[Set([1]), Set([2]), Set([3])]
py> L2
[Set([2]), Set([3]), Set([1])]

Should, but doesn't. Oops. That's a bug.

Personally, I argue that sorting is something that you do to lists, and
that all lists should be sortable regardless of whether the objects within
them have a total order or not. If not, the list should impose a sensible
ordering on them, such as (perhaps) lexicographical order ("dictionary
order").

Look at it this way: books do not have a total ordering. What does it mean
to say that Tolstoy's "War and Peace" is less than Tom Clancy's "The Hunt
For Red October"? Are we comparing by title? Author? Contents? Cost of the
book, dimensions of the book, number of pages, some subjective idea of
quality, average word length, maximum sentence length, publication date,
number of protagonists, or what?

And yet libraries somehow find no difficulty in sorting shelves of
hundreds or thousands of books.

So in my opinion, sorting is a red-herring. In an ideal world (Python 3
perhaps?) sorting will be divorced from comparisons, or at least seeing
other people. But that still leaves the problem of comparisons.



-- 
Steven.




More information about the Python-list mailing list