[Numpy-discussion] Moving forward with value based casting

Fri Jun 21 13:55:16 EDT 2019

On Wed, 2019-06-05 at 21:35 -0400, Marten van Kerkwijk wrote:
> Hi Sebastian,
> 
> Tricky! It seems a balance between unexpected memory blow-up and
> unexpected wrapping (the latter mostly for integers). 
> 
> Some comments specifically on your message first, then some more
> general related ones. 
> 
> 1. I'm very much against letting `a + b` do anything else than
> `np.add(a, b)`.
> 2. For python values, an argument for casting by value is that a
> python int can be arbitrarily long; the only reasonable course of
> action for those seems to make them float, and once you do that one
> might as well cast to whatever type can hold the value (at least
> approximately).

Just to throw it in, in the long run, instead of trying to find a
minimal dtype (which is a bit random), simply ignoring the value of the
scalar may actually be the better option.

The reason for this would be code like:
```
arr = np.zeros(5, dtype=np.int8)

for i in range(200):
    res = arr + i
    print(res.dtype)  # switches from int8 to int16!
```
Instead, try `np.int8(i)` in the loop, and if it fails raise an error.
Or, if that is a bit nasty – especially for interactive usage – we
would go with a warning.

This is nothing we need to decide soon, since I think some of the
complexity will remain (i.e. you still need to know that the scalar is
a floating point number or an integer and change the logic).

Best,

Sebastian

> 3. Not necessarily preferred, but for casting of scalars, one can get
> more consistent behaviour also by extending the casting by value to
> any array that has size=1.
> 
> Overall, just on the narrow question, I'd be quite happy with your
> suggestion of using type information if available, i.e., only cast
> python values to a minimal dtype.If one uses numpy types, those
> mostly will have come from previous calculations with the same
> arrays, so things will work as expected. And in most memory-limited
> applications, one would do calculations in-place anyway (or, as Tyler
> noted, for power users one can assume awareness of memory and thus
> the incentive to tell explicitly what dtype is wanted - just
> `np.add(a, b, dtype=...)`, no need to create `out`).
> 
> More generally, I guess what I don't like about the casting rules
> generally is that there is a presumption that if the value can be
> cast, the operation will generally succeed. For `np.add` and
> `np.subtract`, this perhaps is somewhat reasonable (though for
> unsigned a bit more dubious), but for `np.multiply` or `np.power` it
> is much less so. (Indeed, we had a long discussion about what to do
> with `int ** power` - now special-casing negative integer powers.)
> Changing this, however, probably really is a bridge too far!
> 
> Finally, somewhat related: I think the largest confusing actually
> results from the `uint64+in64 -> float64` casting.  Should this cast
> to int64 instead?
> 
> All the best,
> 
> Marten
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190621/c4811844/attachment.sig>