[Numpy-discussion] Regression: in-place operations (possibly intentional)

Tue Sep 18 15:55:16 EDT 2012

On 2012/09/18 9:25 AM, Charles R Harris wrote:
>
>
> On Tue, Sep 18, 2012 at 1:13 PM, Benjamin Root <ben.root at ou.edu
> <mailto:ben.root at ou.edu>> wrote:
>
>
>
>     On Tue, Sep 18, 2012 at 2:47 PM, Charles R Harris
>     <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>> wrote:
>
>
>
>         On Tue, Sep 18, 2012 at 11:39 AM, Benjamin Root <ben.root at ou.edu
>         <mailto:ben.root at ou.edu>> wrote:
>
>
>
>             On Mon, Sep 17, 2012 at 9:33 PM, Charles R Harris
>             <charlesr.harris at gmail.com
>             <mailto:charlesr.harris at gmail.com>> wrote:
>
>
>
>                 On Mon, Sep 17, 2012 at 3:40 PM, Travis Oliphant
>                 <travis at continuum.io <mailto:travis at continuum.io>> wrote:
>
>
>                     On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote:
>
>                      > Consider the following code:
>                      >
>                      > import numpy as np
>                      > a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>                      > a *= float(255) / 15
>                      >
>                      > In v1.6.x, this yields:
>                      > array([17, 34, 51, 68, 85], dtype=int16)
>                      >
>                      > But in master, this throws an exception about
>                     failing to cast via same_kind.
>                      >
>                      > Note that numpy was smart about this operation
>                     before, consider:
>                      > a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>                      > a *= float(128) / 256
>
>                      > yields:
>                      > array([0, 1, 1, 2, 2], dtype=int16)
>                      >
>                      > Of course, this is different than if one does it
>                     in a non-in-place manner:
>                      > np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5
>                      >
>                      > which yields an array with floating point dtype
>                     in both versions.  I can appreciate the arguments
>                     for preventing this kind of implicit casting between
>                     non-same_kind dtypes, but I argue that because the
>                     operation is in-place, then I (as the programmer) am
>                     explicitly stating that I desire to utilize the
>                     current array to store the results of the operation,
>                     dtype and all.  Obviously, we can't completely turn
>                     off this rule (for example, an in-place addition
>                     between integer array and a datetime64 makes no
>                     sense), but surely there is some sort of happy
>                     medium that would allow these sort of operations to
>                     take place?
>                      >
>                      > Lastly, if it is determined that it is desirable
>                     to allow in-place operations to continue working
>                     like they have before, I would like to see such a
>                     fix in v1.7 because if it isn't in 1.7, then other
>                     libraries (such as matplotlib, where this issue was
>                     first found) would have to change their code anyway
>                     just to be compatible with numpy.
>
>                     I agree that in-place operations should allow
>                     different casting rules.  There are different
>                     opinions on this, of course, but generally this is
>                     how NumPy has worked in the past.
>
>                     We did decide to change the default casting rule to
>                     "same_kind" but making an exception for in-place
>                     seems reasonable.
>
>
>                 I think that in these cases same_kind will flag what are
>                 most likely programming errors and sloppy code. It is
>                 easy to be explicit and doing so will make the code more
>                 readable because it will be immediately obvious what the
>                 multiplicand is without the need to recall what the
>                 numpy casting rules are in this exceptional case. IISTR
>                 several mentions of this before (Gael?), and in some of
>                 those cases it turned out that bugs were being turned
>                 up. Catching bugs with minimal effort is a good thing.
>
>                 Chuck
>
>
>             True, it is quite likely to be a programming error, but then
>             again, there are many cases where it isn't.  Is the problem
>             strictly that we are trying to downcast the float to an int,
>             or is it that we are trying to downcast to a lower
>             precision?  Is there a way for one to explicitly relax the
>             same_kind restriction?
>
>
>         I think the problem is down casting across kinds, with the
>         result that floats are truncated and the imaginary parts of
>         imaginaries might be discarded. That is, the value, not just the
>         precision, of the rhs changes. So I'd favor an explicit cast in
>         code like this, i.e., cast the rhs to an integer.
>
>         It is true that this forces downstream to code up to a higher
>         standard, but I don't see that as a bad thing, especially if it
>         exposes bugs. And it isn't difficult to fix.
>
>         Chuck
>
>
>     Mind you, in my case, casting the rhs as an integer before doing the
>     multiplication would be a bug, since our value for the rhs is
>     usually between zero and one.  Multiplying first by the integer
>     numerator before dividing by the integer denominator would likely
>     cause issues with overflowing the 16 bit integer.
>
>
> For the case in point I'd do
>
> In [1]: a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>
> In [2]: a //= 2
>
> In [3]: a
> Out[3]: array([0, 1, 1, 2, 2], dtype=int16)
>
> Although I expect you would want something different in practice. But
> the current code already looks fragile to me and I think it is a good
> thing you are taking a closer look at it. If you really intend going
> through a float, then it should be something like
>
> a = (a*(float(128)/256)).astype(int16)

That's actually what we had been doing for years until a seemingly 
harmless "optimization" snuck in via an unrelated PR.  Fortunately, Ben 
caught it after only a few days.

Eric

>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>