[Numpy-discussion] Moving forward with value based casting
Sebastian Berg
sebastian at sipsolutions.net
Fri Jun 21 13:55:16 EDT 2019
On Wed, 2019-06-05 at 21:35 -0400, Marten van Kerkwijk wrote:
> Hi Sebastian,
>
> Tricky! It seems a balance between unexpected memory blow-up and
> unexpected wrapping (the latter mostly for integers).
>
> Some comments specifically on your message first, then some more
> general related ones.
>
> 1. I'm very much against letting `a + b` do anything else than
> `np.add(a, b)`.
> 2. For python values, an argument for casting by value is that a
> python int can be arbitrarily long; the only reasonable course of
> action for those seems to make them float, and once you do that one
> might as well cast to whatever type can hold the value (at least
> approximately).
Just to throw it in, in the long run, instead of trying to find a
minimal dtype (which is a bit random), simply ignoring the value of the
scalar may actually be the better option.
The reason for this would be code like:
```
arr = np.zeros(5, dtype=np.int8)
for i in range(200):
res = arr + i
print(res.dtype) # switches from int8 to int16!
```
Instead, try `np.int8(i)` in the loop, and if it fails raise an error.
Or, if that is a bit nasty – especially for interactive usage – we
would go with a warning.
This is nothing we need to decide soon, since I think some of the
complexity will remain (i.e. you still need to know that the scalar is
a floating point number or an integer and change the logic).
Best,
Sebastian
> 3. Not necessarily preferred, but for casting of scalars, one can get
> more consistent behaviour also by extending the casting by value to
> any array that has size=1.
>
> Overall, just on the narrow question, I'd be quite happy with your
> suggestion of using type information if available, i.e., only cast
> python values to a minimal dtype.If one uses numpy types, those
> mostly will have come from previous calculations with the same
> arrays, so things will work as expected. And in most memory-limited
> applications, one would do calculations in-place anyway (or, as Tyler
> noted, for power users one can assume awareness of memory and thus
> the incentive to tell explicitly what dtype is wanted - just
> `np.add(a, b, dtype=...)`, no need to create `out`).
>
> More generally, I guess what I don't like about the casting rules
> generally is that there is a presumption that if the value can be
> cast, the operation will generally succeed. For `np.add` and
> `np.subtract`, this perhaps is somewhat reasonable (though for
> unsigned a bit more dubious), but for `np.multiply` or `np.power` it
> is much less so. (Indeed, we had a long discussion about what to do
> with `int ** power` - now special-casing negative integer powers.)
> Changing this, however, probably really is a bridge too far!
>
> Finally, somewhat related: I think the largest confusing actually
> results from the `uint64+in64 -> float64` casting. Should this cast
> to int64 instead?
>
> All the best,
>
> Marten
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190621/c4811844/attachment.sig>
More information about the NumPy-Discussion
mailing list