[Numpy-discussion] New draft of NEP 31 — Context-local and global overrides of the NumPy API

Hameer Abbasi einstein.edison at gmail.com
Mon Oct 28 15:23:02 EDT 2019


Hello everyone, I’ve improved upon the content of NEP 31 to make it simpler, and also according to the new NEP template, only part of the NEP is being sent out to the mailing list. For the full nep, please see PR 14793<https://github.com/numpy/numpy/pull/14793>.

============================================================
NEP 31 — Context-local and global overrides of the NumPy API
============================================================

:Author: Hameer Abbasi <habbasi at quansight.com>
:Author: Ralf Gommers <rgommers at quansight.com>
:Author: Peter Bell <pbell at quansight.com>
:Status: Draft
:Type: Standards Track
:Created: 2019-08-22


Abstract
--------

This NEP proposes to make all of NumPy's public API overridable via an
extensible backend mechanism.

Acceptance of this NEP means NumPy would provide global and context-local
overrides in a separate namespace, as well as a dispatch mechanism similar
to NEP-18 [2]_. First experiences with ``__array_function__`` show that it
is necessary to be able to override NumPy functions that *do not take an
array-like argument*, and hence aren't overridable via
``__array_function__``. The most pressing need is array creation and coercion
functions, such as ``numpy.zeros`` or ``numpy.asarray``; see e.g. NEP-30 [9]_.

This NEP proposes to allow, in an opt-in fashion, overriding any part of the
NumPy API. It is intended as a comprehensive resolution to NEP-22 [3]_, and
obviates the need to add an ever-growing list of new protocols for each new
type of function or object that needs to become overridable.

Motivation and Scope
--------------------

The primary end-goal of this NEP is to make the following possible:

.. code:: python

    # On the library side
    import numpy.overridable as unp

   def library_function(array):
        array = unp.asarray(array)
        # Code using unumpy as usual
        return array

    # On the user side:
    import numpy.overridable as unp
    import uarray as ua
    import dask.array as da

    ua.register_backend(da) # Can be done within Dask itself

    library_function(dask_array)  # works and returns dask_array

    with unp.set_backend(da):
        library_function([1, 2, 3, 4])  # actually returns a Dask array.

Here, ``backend`` can be any compatible object defined either by NumPy or an
external library, such as Dask or CuPy. Ideally, it should be the module
``dask.array`` or ``cupy`` itself.

These kinds of overrides are useful for both the end-user as well as library
authors. End-users may have written or wish to write code that they then later
speed up or move to a different implementation, say PyData/Sparse. They can do
this simply by setting a backend. Library authors may also wish to write code
that is portable across array implementations, for example ``sklearn`` may wish
to write code for a machine learning algorithm that is portable across array
implementations while also using array creation functions.

This NEP takes a holistic approach: It assumes that there are parts of
the API that need to be overridable, and that these will grow over time. It
provides a general framework and a mechanism to avoid a design of a new
protocol each time this is required. This was the goal of ``uarray``: to
allow for overrides in an API without needing the design of a new protocol.

This NEP proposes the following: That ``unumpy`` [8]_  becomes the
recommended override mechanism for the parts of the NumPy API not yet covered
by ``__array_function__`` or ``__array_ufunc__``, and that ``uarray`` is
vendored into a new namespace within NumPy to give users and downstream
dependencies access to these overrides.  This vendoring mechanism is similar
to what SciPy decided to do for making ``scipy.fft`` overridable (see [10]_).

The motivation behind ``uarray`` is manyfold: First, there have been several
attempts to allow dispatch of parts of the NumPy API, including (most
prominently), the ``__array_ufunc__`` protocol in NEP-13 [4]_, and the
``__array_function__`` protocol in NEP-18 [2]_, but this has shown the need
for further protocols to be developed, including a protocol for coercion (see
[5]_, [9]_). The reasons these overrides are needed have been extensively
discussed in the references, and this NEP will not attempt to go into the
details of why these are needed; but in short: It is necessary for library
authors to be able to coerce arbitrary objects into arrays of their own types,
such as CuPy needing to coerce to a CuPy array, for example, instead of
a NumPy array. In simpler words, one needs things like ``np.asarray(...)`` or
an alternative to "just work" and return duck-arrays.

Usage and Impact
----------------

This NEP allows for global and context-local overrides, as well as
automatic overrides a-la ``__array_function__``.

Here are some use-cases this NEP would enable, besides the
first one stated in the motivation section:

The first is allowing alternate dtypes to return their
respective arrays.

.. code:: python

    # Returns an XND array
    x = unp.ones((5, 5), dtype=xnd_dtype) # Or torch dtype

The second is allowing overrides for parts of the API.
This is to allow alternate and/or optimised implementations
for ``np.linalg``, BLAS, and ``np.random``.

.. code:: python

    import numpy as np
    import pyfftw # Or mkl_fft

    # Makes pyfftw the default for FFT
    np.set_global_backend(pyfftw)

    # Uses pyfftw without monkeypatching
    np.fft.fft(numpy_array)

    with np.set_backend(pyfftw) # Or mkl_fft, or numpy
        # Uses the backend you specified
        np.fft.fft(numpy_array)

This will allow an official way for overrides to work with NumPy without
monkeypatching or distributing a modified version of NumPy.

Here are a few other use-cases, implied but not already
stated:

.. code:: python

    data = da.from_zarr('myfile.zarr')
    # result should still be dask, all things being equal
    result = library_function(data)
    result.to_zarr('output.zarr')

This second one would work if ``magic_library`` was built
on top of ``unumpy``.

.. code:: python

    from dask import array as da
    from magic_library import pytorch_predict

    data = da.from_zarr('myfile.zarr')
    # normally here one would use e.g. data.map_overlap
    result = pytorch_predict(data)
    result.to_zarr('output.zarr')

Backward compatibility
----------------------

There are no backward incompatible changes proposed in this NEP.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20191028/3f9cee0b/attachment-0001.html>


More information about the NumPy-Discussion mailing list