[Numpy-discussion] #2522 numpy.diff fails on unsigned integers

Tue Nov 4 13:19:24 EST 2014

On 2014-11-04 15:06, Todd wrote:
> On Tue, Nov 4, 2014 at 2:50 PM, Sebastian Wagner <sebix at sebix.at
> <mailto:sebix at sebix.at>> wrote:
>
>     Hello,
>
>     I want to bring up Issue #2522 'numpy.diff fails on unsigned integers
>     (Trac #1929)' [1], as it was resonsible for an error in one of our
>     programs. Short explanation of the bug: np.diff performs a subtraction
>     on the input array. If this is of type uint and the data contains
>     falling data, it results in an artihmetic underflow.
>
>     >>> np.diff(np.array([0,1,0], dtype=np.uint8))
>     array([  1, 255], dtype=uint8)
>
>     @charris proposed either
>     - a note to the doc string and maybe an example to clarify things
>     - or raise a warning
>     but with a discussion on the list.
>
>     I would like to start it now, as it is an error which is not easily
>     detectable (no errors or warnings are thrown). In our case the
>     type of a
>     data sequence, with only zeros and ones, had type f8 as also every
>     other
>     one, has been changed to u4. As the programs looked for values ==1 and
>     ==-1, it broke silently.
>     In my opinion, a note in the docs is not enough and does not help
>     if the
>     type changed or set after the program has been written.
>     I'd go for automatic upcasting of uints by default and an option
>     to turn
>     it off, if this behavior is explicitly wanted. This wouldn't be
>     correct
>     from the point of view of a programmer, but as most of the users
>     have a
>     scientific background who excpect it 'to work', instead of sth is
>     theoretically correct but not convenient. (I count myself to the first
>     group)
>
>
>
> When you say "automatic upcasting", that would be, for example uint8
> to int16?  What about for uint64?  There is no int128.
The upcast should go to the next bigger, otherwise it would again result
in wrong values. uint64 we can't do that, so it has to stay.
> Also, when you say "by default", is this only when an overflow is
> detected, or always?
I don't know how I could detect an overflow in the diff-function. In
subtraction it should be possible, but that's very deep in the
numpy-internals.
> How would the option to turn it off be implemented?  An argument to
> np.diff or some sort of global option?
I thought of a parameter upcast_int=True for the function.
> -- 
> gpg --keyserver keys.gnupg.net --recv-key DC9B463B