[Numpy-discussion] Regression: in-place operations (possibly intentional)

Tue Sep 18 17:07:48 EDT 2012

On Tue, Sep 18, 2012 at 2:52 PM, Benjamin Root <ben.root at ou.edu> wrote:

>
>
> On Tue, Sep 18, 2012 at 4:42 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Tue, Sep 18, 2012 at 2:33 PM, Travis Oliphant <travis at continuum.io>wrote:
>>
>>>
>>> On Sep 18, 2012, at 2:44 PM, Charles R Harris wrote:
>>>
>>>
>>>
>>> On Tue, Sep 18, 2012 at 1:35 PM, Benjamin Root <ben.root at ou.edu> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Sep 18, 2012 at 3:25 PM, Charles R Harris <
>>>> charlesr.harris at gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 18, 2012 at 1:13 PM, Benjamin Root <ben.root at ou.edu>wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 18, 2012 at 2:47 PM, Charles R Harris <
>>>>>> charlesr.harris at gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 18, 2012 at 11:39 AM, Benjamin Root <ben.root at ou.edu>wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Sep 17, 2012 at 9:33 PM, Charles R Harris <
>>>>>>>> charlesr.harris at gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Sep 17, 2012 at 3:40 PM, Travis Oliphant <
>>>>>>>>> travis at continuum.io> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote:
>>>>>>>>>>
>>>>>>>>>> > Consider the following code:
>>>>>>>>>> >
>>>>>>>>>> > import numpy as np
>>>>>>>>>> > a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>>>>>>>>>> > a *= float(255) / 15
>>>>>>>>>> >
>>>>>>>>>> > In v1.6.x, this yields:
>>>>>>>>>> > array([17, 34, 51, 68, 85], dtype=int16)
>>>>>>>>>> >
>>>>>>>>>> > But in master, this throws an exception about failing to cast
>>>>>>>>>> via same_kind.
>>>>>>>>>> >
>>>>>>>>>> > Note that numpy was smart about this operation before, consider:
>>>>>>>>>> > a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>>>>>>>>>> > a *= float(128) / 256
>>>>>>>>>>
>>>>>>>>>> > yields:
>>>>>>>>>> > array([0, 1, 1, 2, 2], dtype=int16)
>>>>>>>>>> >
>>>>>>>>>> > Of course, this is different than if one does it in a
>>>>>>>>>> non-in-place manner:
>>>>>>>>>> > np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5
>>>>>>>>>> >
>>>>>>>>>> > which yields an array with floating point dtype in both
>>>>>>>>>> versions.  I can appreciate the arguments for preventing this kind of
>>>>>>>>>> implicit casting between non-same_kind dtypes, but I argue that because the
>>>>>>>>>> operation is in-place, then I (as the programmer) am explicitly stating
>>>>>>>>>> that I desire to utilize the current array to store the results of the
>>>>>>>>>> operation, dtype and all.  Obviously, we can't completely turn off this
>>>>>>>>>> rule (for example, an in-place addition between integer array and a
>>>>>>>>>> datetime64 makes no sense), but surely there is some sort of happy medium
>>>>>>>>>> that would allow these sort of operations to take place?
>>>>>>>>>> >
>>>>>>>>>> > Lastly, if it is determined that it is desirable to allow
>>>>>>>>>> in-place operations to continue working like they have before, I would like
>>>>>>>>>> to see such a fix in v1.7 because if it isn't in 1.7, then other libraries
>>>>>>>>>> (such as matplotlib, where this issue was first found) would have to change
>>>>>>>>>> their code anyway just to be compatible with numpy.
>>>>>>>>>>
>>>>>>>>>> I agree that in-place operations should allow different casting
>>>>>>>>>> rules.  There are different opinions on this, of course, but generally this
>>>>>>>>>> is how NumPy has worked in the past.
>>>>>>>>>>
>>>>>>>>>> We did decide to change the default casting rule to "same_kind"
>>>>>>>>>> but making an exception for in-place seems reasonable.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think that in these cases same_kind will flag what are most
>>>>>>>>> likely programming errors and sloppy code. It is easy to be explicit and
>>>>>>>>> doing so will make the code more readable because it will be immediately
>>>>>>>>> obvious what the multiplicand is without the need to recall what the numpy
>>>>>>>>> casting rules are in this exceptional case. IISTR several mentions of this
>>>>>>>>> before (Gael?), and in some of those cases it turned out that bugs were
>>>>>>>>> being turned up. Catching bugs with minimal effort is a good thing.
>>>>>>>>>
>>>>>>>>> Chuck
>>>>>>>>>
>>>>>>>>>
>>>>>>>> True, it is quite likely to be a programming error, but then again,
>>>>>>>> there are many cases where it isn't.  Is the problem strictly that we are
>>>>>>>> trying to downcast the float to an int, or is it that we are trying to
>>>>>>>> downcast to a lower precision?  Is there a way for one to explicitly relax
>>>>>>>> the same_kind restriction?
>>>>>>>>
>>>>>>>
>>>>>>> I think the problem is down casting across kinds, with the result
>>>>>>> that floats are truncated and the imaginary parts of imaginaries might be
>>>>>>> discarded. That is, the value, not just the precision, of the rhs changes.
>>>>>>> So I'd favor an explicit cast in code like this, i.e., cast the rhs to an
>>>>>>> integer.
>>>>>>>
>>>>>>> It is true that this forces downstream to code up to a higher
>>>>>>> standard, but I don't see that as a bad thing, especially if it exposes
>>>>>>> bugs. And it isn't difficult to fix.
>>>>>>>
>>>>>>> Chuck
>>>>>>>
>>>>>>>
>>>>>> Mind you, in my case, casting the rhs as an integer before doing the
>>>>>> multiplication would be a bug, since our value for the rhs is usually
>>>>>> between zero and one.  Multiplying first by the integer numerator before
>>>>>> dividing by the integer denominator would likely cause issues with
>>>>>> overflowing the 16 bit integer.
>>>>>>
>>>>>>
>>>>> For the case in point I'd do
>>>>>
>>>>> In [1]: a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>>>>>
>>>>> In [2]: a //= 2
>>>>>
>>>>> In [3]: a
>>>>> Out[3]: array([0, 1, 1, 2, 2], dtype=int16)
>>>>>
>>>>> Although I expect you would want something different in practice. But
>>>>> the current code already looks fragile to me and I think it is a good thing
>>>>> you are taking a closer look at it. If you really intend going through a
>>>>> float, then it should be something like
>>>>>
>>>>> a = (a*(float(128)/256)).astype(int16)
>>>>>
>>>>> Chuck
>>>>>
>>>>>
>>>> And thereby losing the memory benefit of an in-place multiplication?
>>>>
>>>
>>> What makes you think you are getting that? I'd have to check the numpy
>>> C source, but I expect the multiplication is handled just as I wrote it
>>> out. I don't recall any loops that handle mixed types likes that. I'd like
>>> to see some, though, scaling integers is a common problem.
>>>
>>>
>>>
>>>
>>>> That is sort of the point of all this.  We are using 16 bit integers
>>>> because we wanted to be as efficient as possible and didn't need anything
>>>> larger.  Note, that is what we changed the code to, I am just wondering if
>>>> we are being too cautious.  The casting kwarg looks to be what I might
>>>> want, though it isn't as clean as just writing an "*=" statement.
>>>>
>>>>
>>> I think even there you will have an intermediate float array followed by
>>> a cast.
>>>
>>>
>>> This is true, but it is done in chunks of a fixed size (controllable by
>>> a thread-local variable or keyword argument to the ufunc).
>>>
>>> How difficult would it be to change in-place operations back to the
>>> "unsafe" default?
>>>
>>
>> Probably not too difficult, but I think it would be a mistake. What
>> keyword argument are you referring to? In the current case, I think what is
>> wanted is a scaling function that will actually do things in place. The
>> matplotlib folks would probably be happier with the result if they simply
>> coded up a couple of small Cython routines to do that.
>>
>> Chuck
>>
>>
> As far as matplotlib is concerned, the problem was solved when we reverted
> a change.  The issue that I am raising is that it was such an innocuous,
> and frankly, obvious change to do an in-place operation in the first
> place.  I have to wonder if we are being overly cautious with "same_kind".
> You are right, we probably would benefit greatly from creating some CXX
> scaling functions (contrary to popular belief, we don't use Cython),
> however, I would imagine that such general-purpose function would fare
> better within NumPy.  But, ultimately, Python is about there being one
> right way of doing something, and so I think the goal should be to have a
> somewhat more restrictive casting rule than "unsafe" for in-place
> operations, but restrictive enough to catch the sort of errors "same_kind"
> was catching.  This way, I have one way of doing an inplace operation,
> regardless of the types of my operands.
>
>
I think there can be a difference of opinion as to "right" way. I think
Mark got it right. Making these sort of things explicit lets the programmer
tell the world that, yes, I meant to do that, it isn't something I
overlooked. Remember, you aren't writing code for yourself, you are writing
it for the next person who reads it.

Python is loosely typed, but it compensates for that by having only
doubles, complex doubles, and unlimited precision integers. Numpy isn't
like that, and for numerical work it is especially important to account for
downcasts because of the potential dangers. The way things have trended in
practice is for downcasts to be made more and more explicit.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120918/5bd05dab/attachment.html>