Proposal: === and !=== operators

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Jul 13 00:48:34 EDT 2014


On Sat, 12 Jul 2014 20:14:32 +0200, Johannes Bauer wrote:

> On 12.07.2014 18:35, Steven D'Aprano wrote:
> 
>> If you said, "for many purposes, one should not compare floats for
>> equality, but should use some sort of fuzzy comparison instead" then I
>> would agree with you. But your insistence that equality "always" is
>> wrong takes it out of good advice into the realm of superstition.
> 
> Bullshit. Comparing floats by their representation is *generally* a bad
> idea because of portability issues. You don't know if IEEE754 is used to
> represent floats on the systems that your code is used on.

How many systems do you know of are there that support Python but don't 
support IEEE-754? Here's a thread from 2008 which discusses this:

https://mail.python.org/pipermail/python-dev/2008-February/076680.html


If you're running my code on an implementation of Python without IEEE-754 
floats, then I'm quite happy to say "Sorry guys, that's not my problem, 
you're on your own."

And if you're running an implementation of Python where 1.0 != 1.0, well, 
your system is so broken that there is no hope for it. None what so ever.


> You're hairsplitting: when I'd have said "in 99.9% of cases" you'd agree
> with me

I never said that.

I would not put a percentage to it, because it depends on the context and 
what you are trying to do. For some uses, exact equality is the right 
solution. For others, an absolute epsilon comparison is better. For yet 
others still, a relative error, or a ULP comparison, are better 
solutions. There's no way of putting a percentage to those. You have to 
understand what you are doing, and not just mindlessly follow some 
superstition.

When you mindlessly follow superstition, you end up with bogus warnings 
like this:

https://gcc.gnu.org/ml/gcc/2001-08/msg00853.html



> but since I said "always" you disagree. Don't lawyer out.
> Comparing binary representation of floats is a crappy idea.

Yes. And *not* comparing floats with == is a crappy idea too. *EVERY* 
method of comparing two floats to see if they are the same can break 
under some circumstances. Everything about floats is crappy, except that 
avoiding floats completely is *worse*.

Nevertheless, floats are not magically cursed. They are deterministic, 
for any fixed combination of CPU (or FPU) + machine code, and if you 
understand how floats work, then you can understand when to use exact 
equality and when not to:

http://randomascii.wordpress.com/2012/06/26/doubles-are-not-floats-so-
dont-compare-them/


Using any sort of fuzzy comparison means that you lose transitivity:

if x == y and y == z then x == z

This holds for any sane floating point system, but it doesn't hold with 
fuzzy comparisons. By default, APL uses fuzzy comparisons instead of 
exact equality. Out of the thousands of programming languages ever 
designed, APL is unique, or at most one of a tiny handful of languages, 
which eschews exact float equality. Why do you think that is?

The idea of tolerant comparisons and fuzzy functions is a fundamental 
design feature of APL:

http://www.jsoftware.com/papers/satn23.htm

nevertheless even in APL there are uses for setting the tolerance to zero 
(i.e. to get exact equality). Robert Bernecky gives one such example, and 
writes "In such a search exact comparison is absolutely necessary."


[Aside: I note that despite providing fuzzy comparison functions, and a 
system variable that controls the amount of fuzz, APL merely pushes the 
decision of how much fuzz is appropriate onto the user:

"In general, ⎕ct should be chosen to be the smallest value which is large 
enough to mask common arithmetic errors."

And what about uncommon arithmetic errors, I wonder? But I digress.]


> Even more so in the age of cloud computing where your code is executed
> on who knows which architecture where the exact same high level
> interpretation might lead to vastly different results. 

If so, then that is a bug in the cloud computing platform. Not my 
problem. Complain to the provider.


> Not to mention
> high performance computing, where specialized FPUs can numerously be
> found which don't give a shit about IEEE754.

Why should I support such broken platforms? If I run Python code on some 
horrible platform which only checks the first 8 characters of a string 
for equality "for performance reasons":

if "monumentless" == "monumental":
    print "Your Python is broken"

we'd all agree that the implementation was broken. Failure to meet at 
least the semantics of CPython floats is likewise broken.


> Another thing why it's good to NEVER compare floats with regards to
> their binary representation: Do you exactly know how your FPU is
> configured by your operating system. Do you know that your FPUs on a
> multiprocessor system are configured all identically with regards to
> 754? Rounding modes, etc?
> 
> Just don't fall in the pit. Don't compare floats via equals.

And this is why that advise is purest superstition. If you don't compare 
floats via equals, what are you supposed to do? Compare via an absolute 
epsilon? Which epsilon? Do you know if (x - y) is correctly rounded?

In the bad old days of the 1960's and 70's, there used to be systems 
where x - y would underflow to zero even though x != y. If you're telling 
me that in 2014 I should write my Python code to support such ancient 
machines (machines that don't even have a Python interpreter, I might 
add!) I'm going to just laugh at you. No I don't have to support such 
machines, or their more modern (but equally broken) equivalent.

The same fears of buggy float implementations and misconfigured FPUs that 
you use to argue against floating point == apply to every other technique 
as well. Do you know if (x-y) <= y*err is correctly rounded? What happens 
if x*err overflows? What if x-y overflows? Are you sure that your crappy 
FPU even guarantees that HUGE_NEGATIVE_NUMBER - HUGE_POSITIVE NUMBER will 
return a negative value instead of overflowing to positive number?


>>> when x < x -> False
>> 
>> Because not all types are ordered:
>> 
>> py> x = 1+3j
>> py> x < x
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> TypeError: unorderable types: complex() < complex()
> 
> Oh, so then you also don't want refelexivity of equals, I think.
> Because, obviously, not all types support comparison for equality:
> 
> #!/usr/bin/python3
> class Yeah(object):
> 	def __eq__(self, other):
> 		raise TypeError("Booya")
> Yeah() == Yeah()
> 
> You cherrypick your logic and hairsplit in your reasoning. It's not
> consistent.

Who says it has to be consistent? Being consistent is a Nice To Have, but 
not a Must Have.

"A foolish consistency is the hobgoblin of little minds, adored by little 
statesmen and philosophers and divines."  -- Ralph Waldo Emerson.


Practicality beats purity. If you want some sort of pure logic language, 
then Python is not the language for you. In Python, there are good, 
useful, practical reasons for the built-in containers to assume 
reflexivity of equality but not of other order comparisons.



-- 
Steven



More information about the Python-list mailing list