FW: Why float('Nan') == float('Nan') is False

Fri Feb 15 00:15:26 EST 2019

Chris,

I don't wish to continue belaboring this topic but will answer you and then
ignore anything non-essential.

You said:

> You shouldn't be testing floats for identity.

I am not suggesting anyone compare floats. I repeat that a nan is not
anything. Now as a technicality, it is considered a float by the type
command as there is no easy way to make an int that is a nan:

Here are multiple ways to make a nan:

>>> f = float("nan")
>>> type(f)
<class 'float'>

Oddly you can make a complex nan, sort of:

>>> c = complex("nan")
>>> c
(nan+0j)
>>> type(c)
<class 'complex'>
>>> c+c
(nan+0j)
>>> c+5
(nan+0j)
>>> c + 5j
(nan+5j)

The above makes me suspect that the underlying implementation sees a complex
number as the combination of two floats.

Now for a deeper anomaly and please don't tell me I shouldn't do this.

There is also a math.nan that seems to behave the same as numpy.nan with a
little twist. It too is unique but not the same anyway. I mean there are two
objects out there in the python world that are implemented seemingly
independently as well as a third that may also be a fourth and fifth and ...

I will now make three kinds of nan, twice, and show how they inter-relate
today in the version of python I am using at this moment. Version 3.71
hosted by IDLE under Cygwin under the latest Windblows. I suspect my other
versions would do the same.

>>> nanfloat1 = float("nan")
>>> nanfloat2 = float("nan")
>>> nanmath1 = math.nan
>>> nanmath2 = math.nan
>>> nannumpy1 = numpy.nan
>>> nannumpy2 = numpy.nan
>>> nanfloat1 is nanfloat2
False
>>> nanmath1 is nanmath2
True
>>> nannumpy1 is nannumpy2
True
>>> nanfloat1 is nanmath1
False
>>> nanfloat1 is nannumpy1
False
>>> nanmath1 is nannumpy1
False

This seems a tad inconsistent but perhaps completely understandable. Yet all
three claim to  float ...

>>> list(map(type, [ nanfloat1, nanmath1, nannumpy1 ] ))
[<class 'float'>, <class 'float'>, <class 'float'>]

Now granted comparing floats is iffy if the floats are computed and often
fails because of the internal bit representation and rounding. But asking if
a copy of a float variable to a new name points to the same internal does
work:

>>> a = 2.0
>>> b = 2.0
>>> a is b
False
>>> c = a
>>> a is c
True

What I see happening here is that math.nan is a real object of some sorts
that is instantiated by the math module at a specific location and
presumable setting anything to it just copies that, sort of.

>>> str(math.nan)
'nan'
>>> dir(math.nan)
['__abs__', '__add__', '__bool__', '__class__', '__delattr__', '__dir__',
'__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__',
'__format__', '__ge__', '__getattribute__', '__getformat__',
'__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__',
'__int__', '__le__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__',
'__new__', '__pos__', '__pow__', '__radd__', '__rdivmod__', '__reduce__',
'__reduce_ex__', '__repr__', '__rfloordiv__', '__rmod__', '__rmul__',
'__round__', '__rpow__', '__rsub__', '__rtruediv__', '__set_format__',
'__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__',
'__truediv__', '__trunc__', 'as_integer_ratio', 'conjugate', 'fromhex',
'hex', 'imag', 'is_integer', 'real']
>>> id(math.nan)
51774064

Oddly, every copy of it gets another address but the same other address
which hints at some indirection in the way it was set up. 

>>> m = math.nan
>>> id(m)
51774064
>>> n = math.nan
>>> id(n)
51774064
>>> o = m
>>> id(o)
51774064

Now do the same for the numpy.nan implementation:

>>> str(numpy.nan)
'nan'
>>> dir(numpy.nan)
['__abs__', '__add__', '__bool__', '__class__', '__delattr__', '__dir__',
'__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__',
'__format__', '__ge__', '__getattribute__', '__getformat__',
'__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__',
'__int__', '__le__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__',
'__new__', '__pos__', '__pow__', '__radd__', '__rdivmod__', '__reduce__',
'__reduce_ex__', '__repr__', '__rfloordiv__', '__rmod__', '__rmul__',
'__round__', '__rpow__', '__rsub__', '__rtruediv__', '__set_format__',
'__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__',
'__truediv__', '__trunc__', 'as_integer_ratio', 'conjugate', 'fromhex',
'hex', 'imag', 'is_integer', 'real']
>>> id(numpy.nan)
57329632

This time that same address is reused:

>>> m = numpy.nan
>>> id(m)
57329632
>>> n = numpy.nan
>>> id(n)
57329632

So the numpy nan is unique. The math nan is something else but confusingly
generates a new but same copy. You may be getting the address o f a proxy
one time and the real one another.

>>> m is n
True

But

>>> m is math.nan
False

Should I give up? No, the above makes some sense as the id() function shows
there ware two addresses involved in one case and not the other.

A truly clean implementation might have one copy system-wide as happens with
None or Ellipsis (...) but it seems the development in python went in
multiple directions and is no longer joined. 

A similar test (not shown) with numpy.nan shows the m and n above are each
other as well as what they copied because they share an ID.

The solution is to NOT look at nan except using the appropriate functions.

>>> [ (math.isnan(nothing), numpy.isnan(nothing))
      for nothing in [ float("nan"), math.nan, numpy.nan ] ]

[(True, True), (True, True), (True, True)]

It seems that at least those two nan checkers work the same on all Not A
Number variants I have tried. So seems safe to stick with it.

Let us hope nobody tells us we have yet other implementations of nan out
there!

Avi

-----Original Message-----
From: Python-list <python-list-bounces+avigross=verizon.net at python.org> On
Behalf Of Chris Angelico
Sent: Thursday, February 14, 2019 11:11 PM
To: Python <python-list at python.org>
Subject: Re: FW: Why float('Nan') == float('Nan') is False

On Fri, Feb 15, 2019 at 2:37 PM Avi Gross <avigross at verizon.net> wrote:
> But here is a curiosity. The numpy add-on package has a nan that is 
> UNIQUE so two copies are the same. Read this transcript and see if it 
> might sometimes even be useful while perhaps confusing the heck out of 
> people who assume all nans are the same, or is it all nans are different?
>
> >>> floata = float('nan')
> >>> floatb = float('nan')
> >>> floata, floatb
> (nan, nan)
> >>> floata == floatb
> False
> >>> floata is floatb
> False
>
> >>> numpya = numpy.nan
> >>> numpyb = numpy.nan
> >>> numpya, numpyb
> (nan, nan)
> >>> numpya == numpyb
> False
> >>> numpya is numpyb
> True
>

You shouldn't be testing floats for identity.

>>> x = 2.0
>>> y, z = x+x, x*x
>>> y == z
True
>>> y is z
False

If nan identity is confusing people, other float identity should be just as
confusing. Or, just don't test value types for identity unless you're
actually trying to see if they're the same object.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list