[Numpy-discussion] nan_to_num and bool arrays

Fri Dec 11 19:03:55 EST 2009

On Fri, Dec 11, 2009 at 3:44 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Fri, Dec 11, 2009 at 2:22 PM, Robert Kern <robert.kern at gmail.com> wrote:
>> On Fri, Dec 11, 2009 at 16:09, Keith Goodman <kwgoodman at gmail.com> wrote:
>>> On Fri, Dec 11, 2009 at 1:14 PM, Robert Kern <robert.kern at gmail.com> wrote:
>>>> On Fri, Dec 11, 2009 at 14:41, Keith Goodman <kwgoodman at gmail.com> wrote:
>>>>> On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>>>
>>>>>> So I agree that it should leave the input untouched when a non-float
>>>>>> dtype is used for some array-like input.
>>>>>
>>>>> Would only one line need to be changed? Would changing
>>>>>
>>>>> if not issubclass(t, _nx.integer):
>>>>>
>>>>> to
>>>>>
>>>>> if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_):
>>>>>
>>>>> do the trick?
>>>>
>>>> That still leaves strings, voids, and objects. I recommend:
>>>>
>>>>  if issubclass(t, _nx.inexact):
>>>>
>>>> Arguably, one should handle nan float objects in object arrays and
>>>> float columns in structured arrays, but the current code does not
>>>> handle either of those anyways.
>>>
>>> Without your change both
>>>
>>>>> np.nan_to_num(np.array([True, False]))
>>>>> np.nan_to_num([1])
>>>
>>> raise exceptions. With your change:
>>>
>>>>> np.nan_to_num(np.array([True, False]))
>>>   array([ True, False], dtype=bool)
>>>>> np.nan_to_num([1])
>>>   array([1])
>>
>> I think this is correct, though the latter one happens by accident.
>> Lists don't have a .dtype attribute so obj2sctype(type([1])) is
>> checked and happens to be object_. The latter line is intended to
>> handle scalars, not sequences. I think that sequences should be
>> coerced to arrays for output and this check should be more explicit
>> about what it handles. [1.0] will have a problem if you don't.
>
> That makes sense. But I'm not smart enough to implement it.
>
>>> On a separate note, this seems a little awkward:
>>>
>>>>> np.nan_to_num(1.0)
>>>   1.0
>>>>> np.nan_to_num(1)
>>>   array(1)
>>>>> x = np.ones(1, dtype=np.int)
>>>>> np.nan_to_num(x[0])
>>>   1
>>
>> Worth fixing.
>
> Would this work?
>
> def nan_to_num(x):
>    try:
>        t = x.dtype.type
>    except AttributeError:
>        t = obj2sctype(type(x))
>    if issubclass(t, _nx.complexfloating):
>        return nan_to_num(x.real) + 1j * nan_to_num(x.imag)
>    else:
>        try:
>            y = x.copy()
>        except AttributeError:
>            y = array(x)
>    if not y.shape:
>        y = array([x])
>        scalar = True
>    else:
>        scalar = False
>    if issubclass(t, _nx.inexact):
>        are_inf = isposinf(y)
>        are_neg_inf = isneginf(y)
>        are_nan = isnan(y)
>        maxf, minf = _getmaxmin(y.dtype.type)
>        y[are_nan] = 0
>        y[are_inf] = maxf
>        y[are_neg_inf] = minf
>    if scalar:
>        y = y[0]
>    return y
>
> Instead of
>
>>> nan_to_num(1.0)
>   1.0
>>> nan_to_num(1)
>   array(1)
>>> nan_to_num(np.array(1.0))
>   1.0
>>> nan_to_num(np.array(1))
>   array(1)
>
> it gives
>
>>> nan_to_num(1.0)
>   1.0
>>> nan_to_num(1)
>   1
>>> nan_to_num(np.array(1.0))
>   1.0
>>> nan_to_num(np.array(1))
>   1
>
> I guess a lot of unit tests need to be written before nan_to_num can
> be fixed. But for now, your bool fix is an improvement.

Ack! The "if issubclass(t, _nx.inexact)" fix doesn't work. It solves
the bool problem but it introduces its own problem since numpy.object_
is not a subclass of inexact:

>> nan_to_num([np.inf])
   array([ Inf])

Yeah, way too many special cases here to do this without full unit
test coverage.