[Numpy-discussion] Change in scalar upcasting rules for 1.6.x?

Mark Wiebe mwwiebe at gmail.com
Mon Feb 13 23:31:43 EST 2012


On Mon, Feb 13, 2012 at 8:04 PM, Travis Oliphant <travis at continuum.io>wrote:

> I disagree with your assessment of the subscript operator, but I'm sure we
> will have plenty of time to discuss that.  I don't think it's correct to
> compare  the corner cases of the fancy indexing and regular indexing to the
> corner cases of type coercion system.    If you recall, I was quite nervous
> about all the changes you made to the coercion rules because I didn't
> believe you fully understood what had been done before and I knew there was
> not complete test coverage.
>
> It is true that both systems have emerged from a long history and could
> definitely use fresh perspectives which we all appreciate you and others
> bringing.   It is also true that few are aware of the details of how things
> are actually implemented and that there are corner cases that are basically
> defined by the algorithm used (this is more true of the type-coercion
> system than fancy-indexing, however).
>

Likely the only way we will be able to know for certain the extent to which
our opinions are accurate is to actually dig into the code. I think we can
agree, however, that at the very least it could use some performance
improvement. :)


> I think it would have been wise to write those extensive tests prior to
> writing new code.   I'm curious if what you were expecting for the output
> was derived from what earlier versions of NumPy produced.    NumPy has
> never been in a state where you could just re-factor at will and assume
> that tests will catch all intended use cases.   Numeric before it was not
> in that state either.   This is a good goal, and we always welcome new
> tests.    It just takes a lot of time and a lot of tedious work that the
> volunteer labor to this point have not had the time to do.
>

I did put quite a bit of effort into maintaining compatibility, and was
incredibly careful about the change we're discussing. I used something I
suspect you created, the "can cast safely" table here:

http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules

I extended it to more cases including scalar/array combinations of type
promotion, and validated that 1.5 and 1.6 produced the same outputs. The
script I used is here:

https://github.com/numpy/numpy/blob/master/numpy/testing/print_coercion_tables.py

I definitely didn't jump into the change blind, but I did approach it from
a clean perspective with the willingness to try and make things better. I
understand this is a delicate balance to walk, and I'd like to stress that
I didn't take any of the changes I made here lightly.

Very few of us have ever been paid to work on NumPy directly and have often
> been trying to fit in improvements to the code base between other jobs we
> are supposed to be doing.    Of course, you and I are hoping to change that
> this year and look forward to the code quality improving commensurately.
>

Well, everything I did for 1.6 that we're discussing here was volunteer
work too. :)

You and Enthought have all the credit for the later bit where I did get
paid a little bit to do the datetime64 and NA stuff!

Thanks for all you are doing.   I also agree that Rolf and Charles
> have-been and are invaluable in the maintenance and progress of NumPy and
> SciPy.   They deserve as much praise and kudos as anyone can give them.
>

It's great to have you back and active in the community again too. I'm sure
this is improving the moods of many NumPy and SciPy users.

-Mark


>
> -Travis
>
>
>
> On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote:
>
> I believe the main lessons to draw from this are just how incredibly
> important a complete test suite and staying on top of code reviews are. I'm
> of the opinion that any explicit design choice of this nature should be
> reflected in the test suite, so that if someone changes it years later,
> they get immediate feedback that they're breaking something important.
> NumPy has gradually increased its test suite coverage, and when I dealt
> with the type promotion subsystem, I added fairly extensive tests:
>
>
> https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345
>
> Another subsystem which is in a similar state as what the type promotion
> subsystem was, is the subscript operator and how regular/fancy indexing
> work. What this means is that any attempt to improve it that doesn't
> coincide with the original intent years ago can easily break things that
> were originally intended without them being caught by a test. I believe
> this subsystem needs improvement, and the transition to new/improved code
> will probably be trickier to manage than for the dtype promotion case.
>
> Let's try to learn from the type promotion case as best we can, and use it
> to improve NumPy's process. I believe Charles and Ralph have been doing a
> great job of enforcing high standards in new NumPy code, and managing the
> release process in a way that has resulted in very few bugs and regressions
> in the release. Most of these quality standards are still informal,
> however, and it's probably a good idea to write them down in a canonical
> location. It will be especially helpful for newcomers, who can treat the
> standards as a checklist before submitting pull requests.
>
> Thanks,
> -Mark
>
> On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant <travis at continuum.io>wrote:
>
>> The problem is that these sorts of things take a while to emerge.  The
>> original system was more consistent than I think you give it credit.  What
>> you are seeing is that most people get NumPy from distributions and are
>> relying on us to keep things consistent.
>>
>> The scalar coercion rules were deterministic and based on the idea that a
>> scalar does not determine the output dtype unless it is of a different
>> kind.   The new code changes that unfortunately.
>>
>> Another thing I noticed is that I thought that int16 <op> scalar float
>> would produce float32 originally.  This seems to have changed, but I need
>> to check on an older version of NumPy.
>>
>> Changing the scalar coercion rules is an unfortunate substantial change
>> in semantics and should not have happened in the 1.X series.
>>
>> I understand you did not get a lot of feedback and spent a lot of time on
>> the code which we all appreciate.   I worked to stay true to the Numeric
>> casting rules incorporating the changes to prevent scalar upcasting due to
>> the absence of single precision Numeric literals in Python.
>>
>> We will need to look in detail at what has changed.  I will write a test
>> to do that.
>>
>> Thanks,
>>
>> Travis
>>
>> --
>> Travis Oliphant
>> (on a mobile)
>> 512-826-7480
>>
>>
>> On Feb 13, 2012, at 7:58 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:
>>
>> On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant <travis at continuum.io>wrote:
>>
>>> Hmmm.   This seems like a regression.  The scalar casting API was fairly
>>> intentional.
>>>
>>> What is the reason for the change?
>>>
>>
>> In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite
>> this subsystem. There were virtually no tests in the test suite specifying
>> what the expected behavior should be, and there were clear inconsistencies
>> where for example "a+b" could result in a different type than "b+a". I
>> recall there being some bugs in the tracker related to this as well, but I
>> don't remember those details.
>>
>> This change felt like an obvious extension of an existing behavior for
>> eliminating overflow, where the promotion changed unsigned -> signed based
>> on the value of the scalar. This change introduced minimal upcasting only
>> in a set of cases where an overflow was guaranteed to happen without that
>> upcasting.
>>
>> During the 1.6 beta period, I signaled that this subsystem had changed,
>> as the bullet point starting "The ufunc uses a more consistent algorithm
>> for loop selection.":
>>
>> http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html
>>
>> The behavior Matthew has observed is a direct result of how I designed
>> the minimization function mentioned in that bullet point, and the algorithm
>> for it is documented in the 'Notes' section of the result_type page:
>>
>> http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html
>>
>> Hopefully that explains it well enough. I made the change intentionally
>> and carefully, tested its impact on SciPy and other projects, and advocated
>> for it during the release cycle.
>>
>> Cheers,
>> Mark
>>
>> --
>>> Travis Oliphant
>>> (on a mobile)
>>> 512-826-7480
>>>
>>>
>>> On Feb 13, 2012, at 6:25 PM, Matthew Brett <matthew.brett at gmail.com>
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > I recently noticed a change in the upcasting rules in numpy 1.6.0 /
>>> > 1.6.1 and I just wanted to check it was intentional.
>>> >
>>> > For all versions of numpy I've tested, we have:
>>> >
>>> >>>> import numpy as np
>>> >>>> Adata = np.array([127], dtype=np.int8)
>>> >>>> Bdata = np.int16(127)
>>> >>>> (Adata + Bdata).dtype
>>> > dtype('int8')
>>> >
>>> > That is - adding an integer scalar of a larger dtype does not result
>>> > in upcasting of the output dtype, if the data in the scalar type fits
>>> > in the smaller.
>>> >
>>> > For numpy < 1.6.0 we have this:
>>> >
>>> >>>> Bdata = np.int16(128)
>>> >>>> (Adata + Bdata).dtype
>>> > dtype('int8')
>>> >
>>> > That is - even if the data in the scalar does not fit in the dtype of
>>> > the array to which it is being added, there is no upcasting.
>>> >
>>> > For numpy >= 1.6.0 we have this:
>>> >
>>> >>>> Bdata = np.int16(128)
>>> >>>> (Adata + Bdata).dtype
>>> > dtype('int16')
>>> >
>>> > There is upcasting...
>>> >
>>> > I can see why the numpy 1.6.0 way might be preferable but it is an API
>>> > change I suppose.
>>> >
>>> > Best,
>>> >
>>> > Matthew
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion at scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120213/47b7a2d8/attachment.html>


More information about the NumPy-Discussion mailing list