[Numpy-discussion] Nice float -> integer conversion?

Wed Nov 2 00:02:12 EDT 2011

Hi,

On Sat, Oct 15, 2011 at 12:20 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Tue, Oct 11, 2011 at 7:32 PM, Benjamin Root <ben.root at ou.edu> wrote:
>> On Tue, Oct 11, 2011 at 2:06 PM, Derek Homeier
>> <derek at astro.physik.uni-goettingen.de> wrote:
>>>
>>> On 11 Oct 2011, at 20:06, Matthew Brett wrote:
>>>
>>> > Have I missed a fast way of doing nice float to integer conversion?
>>> >
>>> > By nice I mean, rounding to the nearest integer, converting NaN to 0,
>>> > inf, -inf to the max and min of the integer range?  The astype method
>>> > and cast functions don't do what I need here:
>>> >
>>> > In [40]: np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16)
>>> > Out[40]: array([1, 0, 0, 0], dtype=int16)
>>> >
>>> > In [41]: np.cast[np.int16](np.array([1.6, np.nan, np.inf, -np.inf]))
>>> > Out[41]: array([1, 0, 0, 0], dtype=int16)
>>> >
>>> > Have I missed something obvious?
>>>
>>> np.[a]round comes closer to what you wish (is there consensus
>>> that NaN should map to 0?), but not quite there, and it's not really
>>> consistent either!
>>>
>>
>> In a way, there is already consensus in the code.  np.nan_to_num() by
>> default converts nans to zero, and the infinities go to very large and very
>> small.
>>
>>     >>> np.set_printoptions(precision=8)
>>     >>> x = np.array([np.inf, -np.inf, np.nan, -128, 128])
>>     >>> np.nan_to_num(x)
>>     array([  1.79769313e+308,  -1.79769313e+308,   0.00000000e+000,
>>             -1.28000000e+002,   1.28000000e+002])
>
> Right - but - we'd still need to round, and take care of the nasty
> issue of thresholding:
>
>>>> x = np.array([np.inf, -np.inf, np.nan, -128, 128])
>>>> x
> array([  inf,  -inf,   nan, -128.,  128.])
>>>> nnx = np.nan_to_num(x)
>>>> nnx
>
> array([  1.79769313e+308,  -1.79769313e+308,   0.00000000e+000,
>        -1.28000000e+002,   1.28000000e+002])
>>>> np.rint(nnx).astype(np.int8)
> array([   0,    0,    0, -128, -128], dtype=int8)
>
> So, I think nice_round would look something like:
>
> def nice_round(arr, out_type):
>    in_type = arr.dtype.type
>    mx = floor_exact(np.iinfo(out_type).max, in_type)
>    mn = floor_exact(np.iinfo(out_type).max, in_type)
>    nans = np.isnan(arr)
>    out = np.rint(np.clip(arr, mn, mx)).astype(out_type)
>    out[nans] = 0
>    return out
>
> with floor_exact being something like:
>
> https://github.com/matthew-brett/nibabel/blob/range-dtype-conversions/nibabel/floating.py

In case anyone is interested or for the sake of anyone later googling
this thread -

I made a working version of nice_round:

https://github.com/matthew-brett/nibabel/blob/floating-stash/nibabel/casting.py

Docstring:
def nice_round(arr, int_type, nan2zero=True, infmax=False):
    """ Round floating point array `arr` to type `int_type`

Parameters
----------
arr : array-like
Array of floating point type
int_type : object
Numpy integer type
nan2zero : {True, False}
Whether to convert NaN value to zero. Default is True. If False, and
NaNs are present, raise CastingError
infmax : {False, True}
If True, set np.inf values in `arr` to be `int_type` integer maximum
value, -np.inf as `int_type` integer minimum. If False, merely set infs
to be numbers at or near the maximum / minumum number in `arr` that can be
contained in `int_type`. Therefore False gives faster conversion at the
expense of infs that are further from infinity.

Returns
-------
iarr : ndarray
of type `int_type`

Examples
--------
>>> nice_round([np.nan, np.inf, -np.inf, 1.1, 6.6], np.int16)
array([ 0, 32767, -32768, 1, 7], dtype=int16)

It wasn't straightforward to find the right place to clip the array to
stop overflow on casting, but I think it's working and tested now.

See y'all,

Matthew