[Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules

Ralf Gommers ralf.gommers at gmail.com
Wed Feb 5 13:22:19 EST 2020


On Wed, Feb 5, 2020 at 12:14 PM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> >  scipy.linalg is a superset of numpy.linalg
>
> This isn't completely accurate - numpy.linalg supports almost all
> operations* over stacks of matrices via gufuncs, but scipy.linalg does not
> appear to.
>
> Eric
>
> *: not lstsq due to an ungeneralizable public API
>

That's true for `qr` as well I believe.

Indeed some functions have diverged slightly, but that's not on purpose,
more like a lack of time to coordinate. We would like to fix that so
everything is in sync and fully API-compatible again.

Ralf


> On Wed, 5 Feb 2020 at 17:38, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>>
>>
>> On Wed, Feb 5, 2020 at 10:01 AM Andreas Mueller <t3kcit at gmail.com> wrote:
>>
>>> A bit late to the NEP 37 party.
>>> I just wanted to say that at least from my perspective it seems a great
>>> solution that will help sklearn move towards more flexible compute engines.
>>> I think one of the biggest issues is array creation (including random
>>> arrays), and that's handled quite nicely with NEP 37.
>>>
>>> There's some discussion on the scikit-learn side here:
>>> https://github.com/scikit-learn/scikit-learn/pull/14963
>>> https://github.com/scikit-learn/scikit-learn/issues/11447
>>>
>>> Two different groups of people tried to use __array_function__ to
>>> delegate to MxNet and CuPy respectively in scikit-learn, and ran into the
>>> same issues.
>>>
>>> There's some remaining issues in sklearn that will not be handled by NEP
>>> 37 but they go beyond NumPy in some sense.
>>> Just to briefly bring them up:
>>>
>>> - We use scipy.linalg in many places, and we would need to do a separate
>>> dispatching to check whether we can use module.linalg instead
>>>  (that might be an issue for many libraries but I'm not sure).
>>>
>>
>> That is an issue, and goes in the opposite direction we need -
>> scipy.linalg is a superset of numpy.linalg, so we'd like to encourage using
>> scipy. This is something we may want to consider fixing by making the
>> dispatch decorator public in numpy and adopting in scipy.
>>
>> Cheers,
>> Ralf
>>
>>
>>
>>>
>>> - Some models have several possible optimization algorithms, some of
>>> which are pure numpy and some which are Cython. If someone provides a
>>> different array module,
>>>  we might want to choose an algorithm that is actually supported by that
>>> module. While this exact issue is maybe sklearn specific, a similar issue
>>> could appear for most downstream libs that use Cython in some places.
>>>  Many Cython algorithms could be implemented in pure numpy with a
>>> potential slowdown, but once we have NEP 37 there might be a benefit to
>>> having a pure NumPy implementation as an alternative code path.
>>>
>>>
>>> Anyway, NEP 37 seems a great step in the right direction and would
>>> enable sklearn to actually dispatch in some places. Dispatching just based
>>> on __array_function__ seems not really feasible so far.
>>>
>>> Best,
>>> Andreas Mueller
>>>
>>>
>>> On 1/6/20 11:29 PM, Stephan Hoyer wrote:
>>>
>>> I am pleased to present a new NumPy Enhancement Proposal for discussion:
>>> "NEP-37: A dispatch protocol for NumPy-like modules." Feedback would be
>>> very welcome!
>>>
>>> The full text follows. The rendered proposal can also be found online at
>>> https://numpy.org/neps/nep-0037-array-module.html
>>>
>>> Best,
>>> Stephan Hoyer
>>>
>>> ===================================================
>>> NEP 37 — A dispatch protocol for NumPy-like modules
>>> ===================================================
>>>
>>> :Author: Stephan Hoyer <shoyer at google.com>
>>> :Author: Hameer Abbasi
>>> :Author: Sebastian Berg
>>> :Status: Draft
>>> :Type: Standards Track
>>> :Created: 2019-12-29
>>>
>>> Abstract
>>> --------
>>>
>>> NEP-18's ``__array_function__`` has been a mixed success. Some projects
>>> (e.g.,
>>> dask, CuPy, xarray, sparse, Pint) have enthusiastically adopted it.
>>> Others
>>> (e.g., PyTorch, JAX, SciPy) have been more reluctant. Here we propose a
>>> new
>>> protocol, ``__array_module__``, that we expect could eventually subsume
>>> most
>>> use-cases for ``__array_function__``. The protocol requires explicit
>>> adoption
>>> by both users and library authors, which ensures backwards
>>> compatibility, and
>>> is also significantly simpler than ``__array_function__``, both of which
>>> we
>>> expect will make it easier to adopt.
>>>
>>> Why ``__array_function__`` hasn't been enough
>>> ---------------------------------------------
>>>
>>> There are two broad ways in which NEP-18 has fallen short of its goals:
>>>
>>> 1. **Maintainability concerns**. `__array_function__` has significant
>>>    implications for libraries that use it:
>>>
>>>    - Projects like `PyTorch
>>>      <https://github.com/pytorch/pytorch/issues/22402>`_, `JAX
>>>      <https://github.com/google/jax/issues/1565>`_ and even
>>> `scipy.sparse
>>>      <https://github.com/scipy/scipy/issues/10362>`_ have been
>>> reluctant to
>>>      implement `__array_function__` in part because they are concerned
>>> about
>>>      **breaking existing code**: users expect NumPy functions like
>>>      ``np.concatenate`` to return NumPy arrays. This is a fundamental
>>>      limitation of the ``__array_function__`` design, which we chose to
>>> allow
>>>      overriding the existing ``numpy`` namespace.
>>>    - ``__array_function__`` currently requires an "all or nothing"
>>> approach to
>>>      implementing NumPy's API. There is no good pathway for **incremental
>>>      adoption**, which is particularly problematic for established
>>> projects
>>>      for which adopting ``__array_function__`` would result in breaking
>>>      changes.
>>>    - It is no longer possible to use **aliases to NumPy functions**
>>> within
>>>      modules that support overrides. For example, both CuPy and JAX set
>>>      ``result_type = np.result_type``.
>>>    - Implementing **fall-back mechanisms** for unimplemented NumPy
>>> functions
>>>      by using NumPy's implementation is hard to get right (but see the
>>>      `version from dask <https://github.com/dask/dask/pull/5043>`_),
>>> because
>>>      ``__array_function__`` does not present a consistent interface.
>>>      Converting all arguments of array type requires recursing into
>>> generic
>>>      arguments of the form ``*args, **kwargs``.
>>>
>>> 2. **Limitations on what can be overridden.** ``__array_function__`` has
>>> some
>>>    important gaps, most notably array creation and coercion functions:
>>>
>>>    - **Array creation** routines (e.g., ``np.arange`` and those in
>>>      ``np.random``) need some other mechanism for indicating what type of
>>>      arrays to create. `NEP 36 <
>>> https://github.com/numpy/numpy/pull/14715>`_
>>>      proposed adding optional ``like=`` arguments to functions without
>>>      existing array arguments. However, we still lack any mechanism to
>>>      override methods on objects, such as those needed by
>>>      ``np.random.RandomState``.
>>>    - **Array conversion** can't reuse the existing coercion functions
>>> like
>>>      ``np.asarray``, because ``np.asarray`` sometimes means "convert to
>>> an
>>>      exact ``np.ndarray``" and other times means "convert to something
>>> _like_
>>>      a NumPy array." This led to the `NEP 30
>>>      <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_
>>> proposal for
>>>      a separate ``np.duckarray`` function, but this still does not
>>> resolve how
>>>      to cast one duck array into a type matching another duck array.
>>>
>>> ``get_array_module`` and the ``__array_module__`` protocol
>>> ----------------------------------------------------------
>>>
>>> We propose a new user-facing mechanism for dispatching to a duck-array
>>> implementation, ``numpy.get_array_module``. ``get_array_module``
>>> performs the
>>> same type resolution as ``__array_function__`` and returns a module with
>>> an API
>>> promised to match the standard interface of ``numpy`` that can implement
>>> operations on all provided array types.
>>>
>>> The protocol itself is both simpler and more powerful than
>>> ``__array_function__``, because it doesn't need to worry about actually
>>> implementing functions. We believe it resolves most of the
>>> maintainability and
>>> functionality limitations of ``__array_function__``.
>>>
>>> The new protocol is opt-in, explicit and with local control; see
>>> :ref:`appendix-design-choices` for discussion on the importance of these
>>> design
>>> features.
>>>
>>> The array module contract
>>> =========================
>>>
>>> Modules returned by ``get_array_module``/``__array_module__`` should
>>> make a
>>> best effort to implement NumPy's core functionality on new array
>>> types(s).
>>> Unimplemented functionality should simply be omitted (e.g., accessing an
>>> unimplemented function should raise ``AttributeError``). In the future,
>>> we
>>> anticipate codifying a protocol for requesting restricted subsets of
>>> ``numpy``;
>>> see :ref:`requesting-restricted-subsets` for more details.
>>>
>>> How to use ``get_array_module``
>>> ===============================
>>>
>>> Code that wants to support generic duck arrays should explicitly call
>>> ``get_array_module`` to determine an appropriate array module from which
>>> to
>>> call functions, rather than using the ``numpy`` namespace directly. For
>>> example:
>>>
>>> .. code:: python
>>>
>>>     # calls the appropriate version of np.something for x and y
>>>     module = np.get_array_module(x, y)
>>>     module.something(x, y)
>>>
>>> Both array creation and array conversion are supported, because
>>> dispatching is
>>> handled by ``get_array_module`` rather than via the types of function
>>> arguments. For example, to use random number generation functions or
>>> methods,
>>> we can simply pull out the appropriate submodule:
>>>
>>> .. code:: python
>>>
>>>     def duckarray_add_random(array):
>>>         module = np.get_array_module(array)
>>>         noise = module.random.randn(*array.shape)
>>>         return array + noise
>>>
>>> We can also write the duck-array ``stack`` function from `NEP 30
>>> <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_, without
>>> the need
>>> for a new ``np.duckarray`` function:
>>>
>>> .. code:: python
>>>
>>>     def duckarray_stack(arrays):
>>>         module = np.get_array_module(*arrays)
>>>         arrays = [module.asarray(arr) for arr in arrays]
>>>         shapes = {arr.shape for arr in arrays}
>>>         if len(shapes) != 1:
>>>             raise ValueError('all input arrays must have the same shape')
>>>         expanded_arrays = [arr[module.newaxis, ...] for arr in arrays]
>>>         return module.concatenate(expanded_arrays, axis=0)
>>>
>>> By default, ``get_array_module`` will return the ``numpy`` module if no
>>> arguments are arrays. This fall-back can be explicitly controlled by
>>> providing
>>> the ``module`` keyword-only argument. It is also possible to indicate
>>> that an
>>> exception should be raised instead of returning a default array module by
>>> setting ``module=None``.
>>>
>>> How to implement ``__array_module__``
>>> =====================================
>>>
>>> Libraries implementing a duck array type that want to support
>>> ``get_array_module`` need to implement the corresponding protocol,
>>> ``__array_module__``. This new protocol is based on Python's dispatch
>>> protocol
>>> for arithmetic, and is essentially a simpler version of
>>> ``__array_function__``.
>>>
>>> Only one argument is passed into ``__array_module__``, a Python
>>> collection of
>>> unique array types passed into ``get_array_module``, i.e., all arguments
>>> with
>>> an ``__array_module__`` attribute.
>>>
>>> The special method should either return an namespace with an API matching
>>> ``numpy``, or ``NotImplemented``, indicating that it does not know how to
>>> handle the operation:
>>>
>>> .. code:: python
>>>
>>>     class MyArray:
>>>         def __array_module__(self, types):
>>>             if not all(issubclass(t, MyArray) for t in types):
>>>                 return NotImplemented
>>>             return my_array_module
>>>
>>> Returning custom objects from ``__array_module__``
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> ``my_array_module`` will typically, but need not always, be a Python
>>> module.
>>> Returning a custom objects (e.g., with functions implemented via
>>> ``__getattr__``) may be useful for some advanced use cases.
>>>
>>> For example, custom objects could allow for partial implementations of
>>> duck
>>> array modules that fall-back to NumPy (although this is not recommended
>>> in
>>> general because such fall-back behavior can be error prone):
>>>
>>> .. code:: python
>>>
>>>     class MyArray:
>>>         def __array_module__(self, types):
>>>             if all(issubclass(t, MyArray) for t in types):
>>>                 return ArrayModule()
>>>             else:
>>>                 return NotImplemented
>>>
>>>     class ArrayModule:
>>>         def __getattr__(self, name):
>>>             import base_module
>>>             return getattr(base_module, name, getattr(numpy, name))
>>>
>>> Subclassing from ``numpy.ndarray``
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> All of the same guidance about well-defined type casting hierarchies from
>>> NEP-18 still applies. ``numpy.ndarray`` itself contains a matching
>>> implementation of ``__array_module__``,  which is convenient for
>>> subclasses:
>>>
>>> .. code:: python
>>>
>>>     class ndarray:
>>>         def __array_module__(self, types):
>>>             if all(issubclass(t, ndarray) for t in types):
>>>                 return numpy
>>>             else:
>>>                 return NotImplemented
>>>
>>> NumPy's internal machinery
>>> ==========================
>>>
>>> The type resolution rules of ``get_array_module`` follow the same model
>>> as
>>> Python and NumPy's existing dispatch protocols: subclasses are called
>>> before
>>> super-classes, and otherwise left to right. ``__array_module__`` is
>>> guaranteed
>>> to be called only  a single time on each unique type.
>>>
>>> The actual implementation of `get_array_module` will be in C, but should
>>> be
>>> equivalent to this Python code:
>>>
>>> .. code:: python
>>>
>>>     def get_array_module(*arrays, default=numpy):
>>>         implementing_arrays, types =
>>> _implementing_arrays_and_types(arrays)
>>>         if not implementing_arrays and default is not None:
>>>             return default
>>>         for array in implementing_arrays:
>>>             module = array.__array_module__(types)
>>>             if module is not NotImplemented:
>>>                 return module
>>>         raise TypeError("no common array module found")
>>>
>>>     def _implementing_arrays_and_types(relevant_arrays):
>>>         types = []
>>>         implementing_arrays = []
>>>         for array in relevant_arrays:
>>>             t = type(array)
>>>             if t not in types and hasattr(t, '__array_module__'):
>>>                 types.append(t)
>>>                 # Subclasses before superclasses, otherwise left to right
>>>                 index = len(implementing_arrays)
>>>                 for i, old_array in enumerate(implementing_arrays):
>>>                     if issubclass(t, type(old_array)):
>>>                         index = i
>>>                         break
>>>                 implementing_arrays.insert(index, array)
>>>         return implementing_arrays, types
>>>
>>> Relationship with ``__array_ufunc__`` and ``__array_function__``
>>> ----------------------------------------------------------------
>>>
>>> These older protocols have distinct use-cases and should remain
>>> ===============================================================
>>>
>>> ``__array_module__`` is intended to resolve limitations of
>>> ``__array_function__``, so it is natural to consider whether it could
>>> entirely
>>> replace ``__array_function__``. This would offer dual benefits: (1)
>>> simplifying
>>> the user-story about how to override NumPy and (2) removing the slowdown
>>> associated with checking for dispatch when calling every NumPy function.
>>>
>>> However, ``__array_module__`` and ``__array_function__`` are pretty
>>> different
>>> from a user perspective: it requires explicit calls to
>>> ``get_array_function``,
>>> rather than simply reusing original ``numpy`` functions. This is
>>> probably fine
>>> for *libraries* that rely on duck-arrays, but may be frustratingly
>>> verbose for
>>> interactive use.
>>>
>>> Some of the dispatching use-cases for ``__array_ufunc__`` are also
>>> solved by
>>> ``__array_module__``, but not all of them. For example, it is still
>>> useful to
>>> be able to define non-NumPy ufuncs (e.g., from Numba or SciPy) in a
>>> generic way
>>> on non-NumPy arrays (e.g., with dask.array).
>>>
>>> Given their existing adoption and distinct use cases, we don't think it
>>> makes
>>> sense to remove or deprecate ``__array_function__`` and
>>> ``__array_ufunc__`` at
>>> this time.
>>>
>>> Mixin classes to implement ``__array_function__`` and ``__array_ufunc__``
>>> =========================================================================
>>>
>>> Despite the user-facing differences, ``__array_module__`` and a module
>>> implementing NumPy's API still contain sufficient functionality needed to
>>> implement dispatching with the existing duck array protocols.
>>>
>>> For example, the following mixin classes would provide sensible defaults
>>> for
>>> these special methods in terms of ``get_array_module`` and
>>> ``__array_module__``:
>>>
>>> .. code:: python
>>>
>>>     class ArrayUfuncFromModuleMixin:
>>>
>>>         def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
>>>             arrays = inputs + kwargs.get('out', ())
>>>             try:
>>>                 array_module = np.get_array_module(*arrays)
>>>             except TypeError:
>>>                 return NotImplemented
>>>
>>>             try:
>>>                 # Note this may have false positive matches, if
>>> ufunc.__name__
>>>                 # matches the name of a ufunc defined by NumPy.
>>> Unfortunately
>>>                 # there is no way to determine in which module a ufunc
>>> was
>>>                 # defined.
>>>                 new_ufunc = getattr(array_module, ufunc.__name__)
>>>             except AttributeError:
>>>                 return NotImplemented
>>>
>>>             try:
>>>                 callable = getattr(new_ufunc, method)
>>>             except AttributeError:
>>>                 return NotImplemented
>>>
>>>             return callable(*inputs, **kwargs)
>>>
>>>     class ArrayFunctionFromModuleMixin:
>>>
>>>         def __array_function__(self, func, types, args, kwargs):
>>>             array_module = self.__array_module__(types)
>>>             if array_module is NotImplemented:
>>>                 return NotImplemented
>>>
>>>             # Traverse submodules to find the appropriate function
>>>             modules = func.__module__.split('.')
>>>             assert modules[0] == 'numpy'
>>>             for submodule in modules[1:]:
>>>                 module = getattr(module, submodule, None)
>>>             new_func = getattr(module, func.__name__, None)
>>>             if new_func is None:
>>>                 return NotImplemented
>>>
>>>             return new_func(*args, **kwargs)
>>>
>>> To make it easier to write duck arrays, we could also add these mixin
>>> classes
>>> into ``numpy.lib.mixins`` (but the examples above may suffice).
>>>
>>> Alternatives considered
>>> -----------------------
>>>
>>> Naming
>>> ======
>>>
>>> We like the name ``__array_module__`` because it mirrors the existing
>>> ``__array_function__`` and ``__array_ufunc__`` protocols. Another
>>> reasonable
>>> choice could be ``__array_namespace__``.
>>>
>>> It is less clear what the NumPy function that calls this protocol should
>>> be
>>> called (``get_array_module`` in this proposal). Some possible
>>> alternatives:
>>> ``array_module``, ``common_array_module``, ``resolve_array_module``,
>>> ``get_namespace``, ``get_numpy``, ``get_numpylike_module``,
>>> ``get_duck_array_module``.
>>>
>>> .. _requesting-restricted-subsets:
>>>
>>> Requesting restricted subsets of NumPy's API
>>> ============================================
>>>
>>> Over time, NumPy has accumulated a very large API surface, with over 600
>>> attributes in the top level ``numpy`` module alone. It is unlikely that
>>> any
>>> duck array library could or would want to implement all of these
>>> functions and
>>> classes, because the frequently used subset of NumPy is much smaller.
>>>
>>> We think it would be useful exercise to define "minimal" subset(s) of
>>> NumPy's
>>> API, omitting rarely used or non-recommended functionality. For example,
>>> minimal NumPy might include ``stack``, but not the other stacking
>>> functions
>>> ``column_stack``, ``dstack``, ``hstack`` and ``vstack``. This could
>>> clearly
>>> indicate to duck array authors and users want functionality is core and
>>> what
>>> functionality they can skip.
>>>
>>> Support for requesting a restricted subset of NumPy's API would be a
>>> natural
>>> feature to include in  ``get_array_function`` and ``__array_module__``,
>>> e.g.,
>>>
>>> .. code:: python
>>>
>>>     # array_module is only guaranteed to contain "minimal" NumPy
>>>     array_module = np.get_array_module(*arrays, request='minimal')
>>>
>>> To facilitate testing with NumPy and use with any valid duck array
>>> library,
>>> NumPy itself would return restricted versions of the ``numpy`` module
>>> when
>>> ``get_array_module`` is called only on NumPy arrays. Omitted functions
>>> would
>>> simply not exist.
>>>
>>> Unfortunately, we have not yet figured out what these restricted subsets
>>> should
>>> be, so it doesn't make sense to do this yet. When/if we do, we could
>>> either add
>>> new keyword arguments to ``get_array_module`` or add new top level
>>> functions,
>>> e.g., ``get_minimal_array_module``. We would also need to add either a
>>> new
>>> protocol patterned off of ``__array_module__`` (e.g.,
>>> ``__array_module_minimal__``), or could add an optional second argument
>>> to
>>> ``__array_module__`` (catching errors with ``try``/``except``).
>>>
>>> A new namespace for implicit dispatch
>>> =====================================
>>>
>>> Instead of supporting overrides in the main `numpy` namespace with
>>> ``__array_function__``, we could create a new opt-in namespace, e.g.,
>>> ``numpy.api``, with versions of NumPy functions that support
>>> dispatching. These
>>> overrides would need new opt-in protocols, e.g.,
>>> ``__array_function_api__``
>>> patterned off of ``__array_function__``.
>>>
>>> This would resolve the biggest limitations of ``__array_function__`` by
>>> being
>>> opt-in and would also allow for unambiguously overriding functions like
>>> ``asarray``, because ``np.api.asarray`` would always mean "convert an
>>> array-like object."  But it wouldn't solve all the dispatching needs met
>>> by
>>> ``__array_module__``, and would leave us with supporting a considerably
>>> more
>>> complex protocol both for array users and implementors.
>>>
>>> We could potentially implement such a new namespace *via* the
>>> ``__array_module__`` protocol. Certainly some users would find this
>>> convenient,
>>> because it is slightly less boilerplate. But this would leave users with
>>> a
>>> confusing choice: when should they use `get_array_module` vs.
>>> `np.api.something`. Also, we would have to add and maintain a whole new
>>> module,
>>> which is considerably more expensive than merely adding a function.
>>>
>>> Dispatching on both types and arrays instead of only types
>>> ==========================================================
>>>
>>> Instead of supporting dispatch only via unique array types, we could also
>>> support dispatch via array objects, e.g., by passing an ``arrays``
>>> argument as
>>> part of the ``__array_module__`` protocol. This could potentially be
>>> useful for
>>> dispatch for arrays with metadata, such provided by Dask and Pint, but
>>> would
>>> impose costs in terms of type safety and complexity.
>>>
>>> For example, a library that supports arrays on both CPUs and GPUs might
>>> decide
>>> on which device to create a new arrays from functions like ``ones``
>>> based on
>>> input arguments:
>>>
>>> .. code:: python
>>>
>>>     class Array:
>>>         def __array_module__(self, types, arrays):
>>>             useful_arrays = tuple(a in arrays if isinstance(a, Array))
>>>             if not useful_arrays:
>>>                 return NotImplemented
>>>             prefer_gpu = any(a.prefer_gpu for a in useful_arrays)
>>>             return ArrayModule(prefer_gpu)
>>>
>>>     class ArrayModule:
>>>         def __init__(self, prefer_gpu):
>>>             self.prefer_gpu = prefer_gpu
>>>
>>>         def __getattr__(self, name):
>>>             import base_module
>>>             base_func = getattr(base_module, name)
>>>             return functools.partial(base_func,
>>> prefer_gpu=self.prefer_gpu)
>>>
>>> This might be useful, but it's not clear if we really need it. Pint
>>> seems to
>>> get along OK without any explicit array creation routines (favoring
>>> multiplication by units, e.g., ``np.ones(5) * ureg.m``), and for the
>>> most part
>>> Dask is also OK with existing ``__array_function__`` style overides
>>> (e.g.,
>>> favoring ``np.ones_like`` over ``np.ones``). Choosing whether to place
>>> an array
>>> on the CPU or GPU could be solved by `making array creation lazy
>>> <https://github.com/google/jax/pull/1668>`_.
>>>
>>> .. _appendix-design-choices:
>>>
>>> Appendix: design choices for API overrides
>>> ------------------------------------------
>>>
>>> There is a large range of possible design choices for overriding NumPy's
>>> API.
>>> Here we discuss three major axes of the design decision that guided our
>>> design
>>> for ``__array_module__``.
>>>
>>> Opt-in vs. opt-out for users
>>> ============================
>>>
>>> The ``__array_ufunc__`` and ``__array_function__`` protocols provide a
>>> mechanism for overriding NumPy functions *within NumPy's existing
>>> namespace*.
>>> This means that users need to explicitly opt-out if they do not want any
>>> overridden behavior, e.g., by casting arrays with ``np.asarray()``.
>>>
>>> In theory, this approach lowers the barrier for adopting these protocols
>>> in
>>> user code and libraries, because code that uses the standard NumPy
>>> namespace is
>>> automatically compatible. But in practice, this hasn't worked out. For
>>> example,
>>> most well-maintained libraries that use NumPy follow the best practice of
>>> casting all inputs with ``np.asarray()``, which they would have to
>>> explicitly
>>> relax to use ``__array_function__``. Our experience has been that making
>>> a
>>> library compatible with a new duck array type typically requires at
>>> least a
>>> small amount of work to accommodate differences in the data model and
>>> operations
>>> that can be implemented efficiently.
>>>
>>> These opt-out approaches also considerably complicate backwards
>>> compatibility
>>> for libraries that adopt these protocols, because by opting in as a
>>> library
>>> they also opt-in their users, whether they expect it or not. For winning
>>> over
>>> libraries that have been unable to adopt ``__array_function__``, an
>>> opt-in
>>> approach seems like a must.
>>>
>>> Explicit vs. implicit choice of implementation
>>> ==============================================
>>>
>>> Both ``__array_ufunc__`` and ``__array_function__`` have implicit
>>> control over
>>> dispatching: the dispatched functions are determined via the appropriate
>>> protocols in every function call. This generalizes well to handling many
>>> different types of objects, as evidenced by its use for implementing
>>> arithmetic
>>> operators in Python, but it has two downsides:
>>>
>>> 1. *Speed*: it imposes additional overhead in every function call,
>>> because each
>>>    function call needs to inspect each of its arguments for overrides.
>>> This is
>>>    why arithmetic on builtin Python numbers is slow.
>>> 2. *Readability*: it is not longer immediately evident to readers of
>>> code what
>>>    happens when a function is called, because the function's
>>> implementation
>>>    could be overridden by any of its arguments.
>>>
>>> In contrast, importing a new library (e.g., ``import  dask.array as
>>> da``) with
>>> an API matching NumPy is entirely explicit. There is no overhead from
>>> dispatch
>>> or ambiguity about which implementation is being used.
>>>
>>> Explicit and implicit choice of implementations are not mutually
>>> exclusive
>>> options. Indeed, most implementations of NumPy API overrides via
>>> ``__array_function__`` that we are familiar with (namely, dask, CuPy and
>>> sparse, but not Pint) also include an explicit way to use their version
>>> of
>>> NumPy's API by importing a module directly (``dask.array``, ``cupy`` or
>>> ``sparse``, respectively).
>>>
>>> Local vs. non-local vs. global control
>>> ======================================
>>>
>>> The final design axis is how users control the choice of API:
>>>
>>> - **Local control**, as exemplified by multiple dispatch and Python
>>> protocols for
>>>   arithmetic, determines which implementation to use either by checking
>>> types
>>>   or calling methods on the direct arguments of a function.
>>> - **Non-local control** such as `np.errstate
>>>   <
>>> https://docs.scipy.org/doc/numpy/reference/generated/numpy.errstate.html
>>> >`_
>>>   overrides behavior with global-state via function decorators or
>>>   context-managers. Control is determined hierarchically, via the
>>> inner-most
>>>   context.
>>> - **Global control** provides a mechanism for users to set default
>>> behavior,
>>>   either via function calls or configuration files. For example,
>>> matplotlib
>>>   allows setting a global choice of plotting backend.
>>>
>>> Local control is generally considered a best practice for API design,
>>> because
>>> control flow is entirely explicit, which makes it the easiest to
>>> understand.
>>> Non-local and global control are occasionally used, but generally either
>>> due to
>>> ignorance or a lack of better alternatives.
>>>
>>> In the case of duck typing for NumPy's public API, we think non-local or
>>> global
>>> control would be mistakes, mostly because they **don't compose well**.
>>> If one
>>> library sets/needs one set of overrides and then internally calls a
>>> routine
>>> that expects another set of overrides, the resulting behavior may be very
>>> surprising. Higher order functions are especially problematic, because
>>> the
>>> context in which functions are evaluated may not be the context in which
>>> they
>>> are defined.
>>>
>>> One class of override use cases where we think non-local and global
>>> control are
>>> appropriate is for choosing a backend system that is guaranteed to have
>>> an
>>> entirely consistent interface, such as a faster alternative
>>> implementation of
>>> ``numpy.fft`` on NumPy arrays. However, these are out of scope for the
>>> current
>>> proposal, which is focused on duck arrays.
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing listNumPy-Discussion at python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200205/46824216/attachment-0001.html>


More information about the NumPy-Discussion mailing list