[Numpy-discussion] (Value Based Promotion) Current Behaviour

Wed Jun 12 13:03:57 EDT 2019

On Tue, 2019-06-11 at 22:08 -0400, Marten van Kerkwijk wrote:
> HI Sebastian,
> 
> Thanks for the overview! In the value-based casting, what perhaps
> surprises me most is that it is done within a kind; it would seem an
> improvement to check whether a given integer scalar is exactly
> representable in a given float (your example of 1024 in `float16`).
> If we switch to the python-only scalar values idea, I would suggest
> to abandon this. That might make dealing with things like `Decimal`
> or `Fraction` easier as well.
> 

Yeah, one can argue that since we have this "safe casting" based
approach, we should go all the way for the value based logic. I think I
tend to agree, but I am not quite sure right now to be honest.

Fractions and Decimals are very interesting in that they raise the
question what happens to user dtypes [0]. Although, you would still
need a "no lower category" rule, since you do not want 1024. or 12/3 be
demoted to an integer.

For me right now, what is most interesting is what we should do with
ufunc calls, and if we can simplify them. I feel right now we have to
types of ufuncs:

1. Ufuncs which use a "common type", where we can find the minimal type
before dispatching.

2. More complex ufuncs, for which finding the minimal type is trickier
[1]. And while I could not find any weird enough ufunc, I am not sure
that blind promotion is a good idea for general ufuncs.

Best,

Sebastian

[0] A python fraction could be converted to int64/int64 or int32/int32,
etc. depending on the value, in principle. If we want such things to
work in principle, we need machinery (although I expect one could tag
that on later).
[1] It is not impossible, but we need to insert non-existing types into
the type hierarchy.

PS: Another interesting issue is that if we try to move away from value
based casting for numpy scalars, that initial `np.asarray(...)` call
may lose the information that a python integer was passed in. So to
support such things, we might need a whole new machinery.

> All the best,
> 
> Marten
> 
> On Tue, Jun 11, 2019 at 8:46 PM Sebastian Berg <
> sebastian at sipsolutions.net> wrote:
> > Hi all,
> > 
> > strange, something went wrong sending that email, but in any
> > case...
> > 
> > I tried to "summarize" the current behaviour of promotion and value
> > based promotion in numpy (correcting a small error in what I wrote
> > earlier). Since it got a bit long, you can find it here (also copy
> > pasted at the end):
> > 
> > https://hackmd.io/NF7Jz3ngRVCIQLU6IZrufA
> > 
> > Allan's document which I link in there is also very interesting.
> > One
> > thing I had not really thought about before was the problem of
> > commutativity.
> > 
> > I do not have any specific points I want to discuss based on it
> > (but
> > those are likely to come up again later).
> > 
> > All the Best,
> > 
> > Sebastian
> > 
> > 
> > -----------------------------
> > 
> > PS: Below a copy of what I wrote:
> > 
> > ---
> > title: Numpy Value Based Promotion Rules
> > author: Sebastian Berg
> > ---
> > 
> > 
> > 
> > NumPy Value Based Scalar Casting and Promotion
> > ==============================================
> > 
> > This document reviews some of the behaviours of the promotion rules
> > within numpy. This is especially with respect to the promotion of
> > scalars and 0D arrays which inspect the value to decide casting and
> > promotion.
> > 
> > Other documents discussing these things:
> > 
> >   * `from numpy.testing import print_coercion_tables` prints the
> > current promotion tables including value based promotion for small
> > positive/negative scalars.
> >   * Allan Haldane's thoughts on changing casting/promotion to be
> > more
> > C-like and discussing things such as here:
> >     
> > https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76
> >   * Discussion around the problem of uint64 and int64 being
> > promoted to
> > float64: https://github.com/numpy/numpy/issues/12525 (lists many
> > related issues).
> > 
> > 
> > Nomenclature and Defintions
> > ---------------------------
> > 
> > * **dtype/type**: The data type of an array or scalar: `float32`,
> > `float64`, `int8`, …
> > 
> > * **Category**: A category to which the data type belongs, in this
> > context these are:
> >   1. boolean
> >   2. integer (unsigned and signed are not split up here, but are
> > different "kinds")
> >   3. floating point and complex (not split up here but are
> > different
> > "kinds")
> >   5. All others
> > 
> > * **Casting**: converting from one dtype to another. There are four
> > different rules of casting:
> >   1. *"safe"* casting: All values are representable in the new data
> > type. I.e. no information is lost during the conversion.
> >   2. *"same kind"* casting: data loss may occur, but only within
> > the
> > same "kind". For example a float64 can be converted to float32
> > using
> > "same kind" rules, an int64 can be converted to int16. This is
> > although
> > both lose precision or even produce incorrect values. Note that
> > "kind"
> > is different from "category" in that it distinguishes between
> > signed
> > and unsigned integers.
> >   4. *"unsafe"* casting: Any conversion which can be defined, e.g.
> > floating point to integer. For promotion this is fairly
> > unimportant.
> > (Some conversions such as string to integer, which not even work
> > fall
> > in this category, but could also be called coercions or
> > conversions.)
> > 
> > * **Promotion**: The general process of finding a new dtype for
> > multiple input dtypes. Will be used here to also denote any kind of
> > casting/promotion done before a specific function is called. This
> > can
> > be more complex, because in rare cases a functions can for example
> > take
> > floating point numbers and integers as input at the same time (i.e.
> > `np.ldexp`).
> > 
> > * **Common dtype**: A dtype which can represent all input data. In
> > general this means that all inputs can be safely cast to this
> > dtype.
> > Within numpy this is the normal and simplest form of promotion.
> > 
> > * **`type1, type2 -> type3`**: Defines a promotion or signature.
> > For
> > example adding two integers: `np.int32(5) + np.int32(3)` gives
> > `np.int32(8)`. The dtype signature for that example would be:
> > `int32,
> > int32 -> int32`. A short form for this is also `ii->i` using C-like
> > type codes, this can be found for example in `np.ldexp.types` (and
> > any
> > numpy ufunc).
> > 
> > * **Scalar**: A numpy or python scalar or a **0-D array**. It is
> > important to remember that zero dimensional arrays are treated just
> > like scalars with respect to casting and promotion.
> > 
> > 
> > Current Situation in Numpy
> > --------------------------
> > 
> > The current situation can be understand mostly in terms of safe
> > casting
> > which is defined based on the type hierarchy and is sensitive to
> > values
> > for scalars.
> > 
> > This safe casting based approach is in contrast for example to
> > promotion within C or Julia, which work based on category first.
> > For
> > example `int32` cannot be safely cast to `float32`, but C or Julia
> > will
> > use `int32, float32 -> float32` as the common type/promotion rule
> > for
> > example to decide on the output dtype for addition.
> > 
> > 
> > ### Python Integers and Floats
> > 
> > Note that python integers are handled exactly like numpy ones. They
> > are, however, special in that they do not have a dtype associated
> > with
> > them explicitly. Value based logic, as described here, seems useful
> > for
> > python integers and floats to allow:
> > ```
> > arr = np.arange(10, dtype=np.int8)
> > arr += 1
> > # or:
> > res = arr + 1
> > res.dtype == np.int8
> > ```
> > which ensures that no upcast (for example with higher memory usage)
> > occurs.
> > 
> > 
> > ### Safe Casting
> > 
> > Most safe casting is clearly defined based on whether or not any
> > possible value is representable in the ouput dtype. Within numpy
> > there
> > is currently a single exception to this rule:
> > `np.can_cast(np.int64,
> > np.float64, casting="safe")` is considered to be true although
> > float64
> > cannot represent some large integer values exactly. In contrast,
> > `np.can_cast(np.int32, np.float32, casting="safe")` is `False` and
> > `np.float64` would have to be used if a "safe" cast is desired.
> > 
> > This exception may be one thing that should be changed, however,
> > concurrently the promotion rules have to be adapted to keep doing
> > the
> > same thing, or a larger behaviour change decided.
> > 
> > 
> > #### Scalar based rules
> > 
> > Unlike arrays, where inspection of all values is not feasable, for
> > scalars (and 0-D arrays) the value is inspected. The casting
> > becomes a
> > two step process:
> >   1. The minimal dtype capable of holding the value is found.
> >   2. The normal casting rules are applied to the new dtype.
> > 
> > The first step uses the following rules by finding the minimal
> > dtype
> > within its category:
> > 
> >  * Boolean: Dtype is already minimal
> > 
> >  * Integers:
> >     Casting is possible if output can hold the value. This includes
> > uint8(127) casting to an int8.
> > 
> >  * Floats and Complex
> >     Scalars can be demoted based on value, roughly this avoids
> > overflows:
> >     ```
> >     float16:     -65000 < value < 65000
> >     float32:    -3.4e38 < value < 3.4e38
> >     float64:   -1.7e308 < value < 1.7e308
> >     float128 (largest type, does not apply).
> >     ```
> >     For complex, the logic is simply applied to both real and
> > imaginary
> > part. Complex numbers cannot be downcast to floating point.
> > 
> >  * Others: Dtype is not modified.
> > 
> > 
> > This two step process means that `np.can_cast(np.int16(1024),
> > np.float16)` is `False` even though float16 is capable of exactly
> > representing the value 1024, since value based "demotion" to a
> > lower
> > dtype is used only within each category.
> > 
> > 
> > 
> > ### Common Type Promotion
> > 
> > For most operations in numpy the output type is just the common
> > type of
> > the inputs, this holds for example for concatenation, as well as
> > almost
> > all math funcions (e.g. addition and multiplication have two
> > identical
> > inputs and need one ouput dtype). This operation is exposed as
> > `np.result_type` which includes value based logic, and
> > `np.promote_types` which only accepts dtypes as input.
> > 
> > Normal type promotion without value based/scalar logic finds the
> > smallest type which both inputs can cast to safely. This will be
> > the
> > largest "kind" (bool < unsigned < integer < float < complex <
> > other).
> > 
> > Note that type promotion is handled in a "reduce" manner from left
> > to
> > right. In rare cases this means it is not associatetive: `float32,
> > uint16, int16 -> float32`, but `float32, (uint16, int16) ->
> > float64`.
> > 
> > #### Scalar based rule
> > 
> > When there is a mix of scalars and arrays, numpy will usually allow
> > the
> > scalars to be handled in the same fashion as for "safe" casting
> > rules.
> > 
> > The rules are as follows:
> > 
> > 1. Value based logic is only applied if the "category" of any array
> > is
> > larger or equal to the category of all scalars. If this is not the
> > case, the typical rules are used.
> >     * Specifically, this means: `np.array([1, 2, 3],
> > dtype=np.uint8) +
> > np.float64(12.)` gives a `float64` result, because the
> > `np.float64(12.)` is not considered for being demoted.
> > 
> > 2. Promotion is applied as normally, however, instead of the
> > original
> > dtype, the minimal dtype is used. In the case where the minimal
> > data
> > type is unsigned (say uint8) but the value is small enough, the
> > minimal
> > type may in fact be either `uint8` or `int8` (127 can be both).
> > This
> > promotion is also applied in pairs (reduction-like) from left to
> > right.
> > 
> > 
> > ### General Promotion during Function Execution
> > 
> > General functions (read "ufuncs" such as `np.add`) may have a
> > specific
> > dtype signature which is (for most dtypes) stored e.g. as
> > `np.add.types`. For many of these functions the common type
> > promotion
> > is used unchanged.
> > 
> > However, some functions will employ a slightly different method
> > (which
> > should be equivalent in most cases). They will loop through all
> > loops
> > listed in `np.add.types` in order and find the first one to which
> > all
> > inputs can be safely cast:
> > ```
> > np.divide.types = ['ee->e', 'ff->f', 'dd->d', ...]
> > ```
> > Thus, `np.divide(np.int16(4), np.float16(3)` will refuse the first
> > `float16, float16 -> float16` (`'ee->e'`) loop because `int16`
> > cannot
> > be cast safely, and then pick the float32 (`'ff->f'`) one.
> > 
> > For simple functions, which commonly have two identical inputs,
> > this
> > should be identical, since normally a clear order exists for the
> > dtypes
> > (it does require checking int8 before uint8, etc.).
> > 
> > #### Scalar based rule
> > 
> > When scalars are involved, the "safe" cast logic based on values is
> > applied *if and only if* rule 1. applies as before: That is there
> > must
> > be an array with a higher or equal category as all of the scalars.
> > 
> > In the above `np.divide` example, this means that
> > `np.divide(np.int16(4), np.array([3], dtype=np.float16))` *will*
> > use
> > the `'ee->e'` loop, because the scalar `4` is of a lower or equal
> > category than the array (integer <= float or complex). While
> > checking,
> > 4 is found to be safely castable to float16, since `(u)int8` is
> > sufficient to hold 4 and that can be safely cast to `float16`.
> > However, `np.divide(np.int16(4), np.int16(3))` would use `float32`
> > because both are scalars and thus value based logic is not used
> > (Note
> > that in reality numpy forces double output for an all integer input
> > in
> > divide).
> > 
> > In it is possible for ufuncs to have mixed type signatures (this is
> > very rare within numy) and arbitrary inputs. In this case, in
> > principle, the question is whether or not a clear ordering exists
> > and
> > if the rule of using value based logic is always clear. This is
> > rather
> > academical (I could not find any such function in numpy or
> > `scipy.special` [^scipy-ufuncs]). But consider:
> > ```
> > imaginary_ufunc.types:
> >     int32, float32 -> int32, float32
> >     int64, float32 -> int64, float32
> >     ...
> > ```
> > it is not clear that `np.int64(5) + np.float32(3.)` should be able
> > to
> > demote the `5`. This is very theoretical of course
> > 
> > 
> > 
> > 
> > Footnotes
> > ---------
> > 
> > [^scipy-ufuncs]: See for example these functions:
> >     ```python
> >     import scipy.special
> >     for n, func in scipy.special.__dict__.items():
> >         if not isinstance(func, np.ufunc):
> >             continue
> > 
> >         if func.nin == 1:
> >             # a single input is not interesting
> >             continue
> > 
> >         # check if the signature is not uniform
> >         for types in func.types:
> >             if len(set(types[:func.nin])) != 1:
> >                 break
> >         else:
> >             continue
> >         print(func, func.types)
> >     ```
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/e2240b27/attachment-0001.sig>