[Numpy-discussion] Revised NEP-18, __array_function__ protocol

Wed Jun 27 11:41:39 EDT 2018

Hi Hameer,

I'm confused: Isn't your reference array just `self`?
All the best,

Marten

On Wed, Jun 27, 2018 at 2:27 AM, Hameer Abbasi <einstein.edison at gmail.com>
wrote:

>
>
> On 27. Jun 2018 at 07:48, Stephan Hoyer <shoyer at gmail.com> wrote:
>
>
> After much discussion (and the addition of three new co-authors!), I’m
> pleased to present a significantly revision of NumPy Enhancement Proposal
> 18: A dispatch mechanism for NumPy's high level array functions:
> http://www.numpy.org/neps/nep-0018-array-function-protocol.html
>
> The full text is also included below.
>
> Best,
> Stephan
>
> ===========================================================
> A dispatch mechanism for NumPy's high level array functions
> ===========================================================
>
> :Author: Stephan Hoyer <shoyer at google.com>
> :Author: Matthew Rocklin <mrocklin at gmail.com>
> :Author: Marten van Kerkwijk <mhvk at astro.utoronto.ca>
> :Author: Hameer Abbasi <hameerabbasi at yahoo.com>
> :Author: Eric Wieser <wieser.eric at gmail.com>
> :Status: Draft
> :Type: Standards Track
> :Created: 2018-05-29
>
> Abstact
> -------
>
> We propose the ``__array_function__`` protocol, to allow arguments of NumPy
> functions to define how that function operates on them. This will allow
> using NumPy as a high level API for efficient multi-dimensional array
> operations, even with array implementations that differ greatly from
> ``numpy.ndarray``.
>
> Detailed description
> --------------------
>
> NumPy's high level ndarray API has been implemented several times
> outside of NumPy itself for different architectures, such as for GPU
> arrays (CuPy), Sparse arrays (scipy.sparse, pydata/sparse) and parallel
> arrays (Dask array) as well as various NumPy-like implementations in the
> deep learning frameworks, like TensorFlow and PyTorch.
>
> Similarly there are many projects that build on top of the NumPy API
> for labeled and indexed arrays (XArray), automatic differentiation
> (Autograd, Tangent), masked arrays (numpy.ma), physical units
> (astropy.units,
> pint, unyt), etc. that add additional functionality on top of the NumPy
> API.
> Most of these project also implement a close variation of NumPy's level
> high
> API.
>
> We would like to be able to use these libraries together, for example we
> would like to be able to place a CuPy array within XArray, or perform
> automatic differentiation on Dask array code. This would be easier to
> accomplish if code written for NumPy ndarrays could also be used by
> other NumPy-like projects.
>
> For example, we would like for the following code example to work
> equally well with any NumPy-like array object:
>
> .. code:: python
>
>     def f(x):
>         y = np.tensordot(x, x.T)
>         return np.mean(np.exp(y))
>
> Some of this is possible today with various protocol mechanisms within
> NumPy.
>
> -  The ``np.exp`` function checks the ``__array_ufunc__`` protocol
> -  The ``.T`` method works using Python's method dispatch
> -  The ``np.mean`` function explicitly checks for a ``.mean`` method on
>    the argument
>
> However other functions, like ``np.tensordot`` do not dispatch, and
> instead are likely to coerce to a NumPy array (using the ``__array__``)
> protocol, or err outright. To achieve enough coverage of the NumPy API
> to support downstream projects like XArray and autograd we want to
> support *almost all* functions within NumPy, which calls for a more
> reaching protocol than just ``__array_ufunc__``. We would like a
> protocol that allows arguments of a NumPy function to take control and
> divert execution to another function (for example a GPU or parallel
> implementation) in a way that is safe and consistent across projects.
>
> Implementation
> --------------
>
> We propose adding support for a new protocol in NumPy,
> ``__array_function__``.
>
> This protocol is intended to be a catch-all for NumPy functionality that
> is not covered by the ``__array_ufunc__`` protocol for universal functions
> (like ``np.exp``). The semantics are very similar to ``__array_ufunc__``,
> except
> the operation is specified by an arbitrary callable object rather than a
> ufunc
> instance and method.
>
> A prototype implementation can be found in
> `this notebook <https://nbviewer.jupyter.org/gist/shoyer/
> 1f0a308a06cd96df20879a1ddb8f0006>`_.
>
> The interface
> ~~~~~~~~~~~~~
>
> We propose the following signature for implementations of
> ``__array_function__``:
>
> .. code-block:: python
>
>     def __array_function__(self, func, types, args, kwargs)
>
> -  ``func`` is an arbitrary callable exposed by NumPy's public API,
>    which was called in the form ``func(*args, **kwargs)``.
> -  ``types`` is a ``frozenset`` of unique argument types from the original
> NumPy
>    function call that implement ``__array_function__``.
> -  The tuple ``args`` and dict ``kwargs`` are directly passed on from the
>    original call.
>
> Unlike ``__array_ufunc__``, there are no high-level guarantees about the
> type of ``func``, or about which of ``args`` and ``kwargs`` may contain
> objects
> implementing the array API.
>
> As a convenience for ``__array_function__`` implementors, ``types``
> provides all
> argument types with an ``'__array_function__'`` attribute. This
> allows downstream implementations to quickly determine if they are likely
> able
> to support the operation. A ``frozenset`` is used to ensure that
> ``__array_function__`` implementations cannot rely on the iteration order
> of
> ``types``, which would facilitate violating the well-defined "Type casting
> hierarchy" described in
> `NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_.
>
> Example for a project implementing the NumPy API
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Most implementations of ``__array_function__`` will start with two
> checks:
>
> 1.  Is the given function something that we know how to overload?
> 2.  Are all arguments of a type that we know how to handle?
>
> If these conditions hold, ``__array_function__`` should return
> the result from calling its implementation for ``func(*args, **kwargs)``.
> Otherwise, it should return the sentinel value ``NotImplemented``,
> indicating
> that the function is not implemented by these types. This is preferable to
> raising ``TypeError`` directly, because it gives *other* arguments the
> opportunity to define the operations.
>
> There are no general requirements on the return value from
> ``__array_function__``, although most sensible implementations should
> probably
> return array(s) with the same type as one of the function's arguments.
> If/when Python gains
> `typing support for protocols <https://www.python.org/dev/peps/pep-0544/
> >`_
> and NumPy adds static type annotations, the ``@overload`` implementation
> for ``SupportsArrayFunction`` will indicate a return type of ``Any``.
>
> It may also be convenient to define a custom decorators (``implements``
> below)
> for registering ``__array_function__`` implementations.
>
> .. code:: python
>
>     HANDLED_FUNCTIONS = {}
>
>     class MyArray:
>         def __array_function__(self, func, types, args, kwargs):
>             if func not in HANDLED_FUNCTIONS:
>                 return NotImplemented
>             # Note: this allows subclasses that don't override
>             # __array_function__ to handle MyArray objects
>             if not all(issubclass(t, MyArray) for t in types):
>                 return NotImplemented
>             return HANDLED_FUNCTIONS[func](*args, **kwargs)
>
>     def implements(numpy_function):
>         """Register an __array_function__ implementation for MyArray
> objects."""
>         def decorator(func):
>             HANDLED_FUNCTIONS[numpy_function] = func
>             return func
>         return decorator
>
>     @implements(np.concatenate)
>     def concatenate(arrays, axis=0, out=None):
>         ...  # implementation of concatenate for MyArray objects
>
>     @implements(np.broadcast_to)
>     def broadcast_to(array, shape):
>         ...  # implementation of broadcast_to for MyArray objects
>
> Note that it is not required for ``__array_function__`` implementations to
> include *all* of the corresponding NumPy function's optional arguments
> (e.g., ``broadcast_to`` above omits the irrelevant ``subok`` argument).
> Optional arguments are only passed in to ``__array_function__`` if they
> were explicitly used in the NumPy function call.
>
> Necessary changes within the NumPy codebase itself
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> This will require two changes within the NumPy codebase:
>
> 1. A function to inspect available inputs, look for the
>    ``__array_function__`` attribute on those inputs, and call those
>    methods appropriately until one succeeds.  This needs to be fast in the
>    common all-NumPy case, and have acceptable performance (no worse than
>    linear time) even if the number of overloaded inputs is large (e.g.,
>    as might be the case for `np.concatenate`).
>
>    This is one additional function of moderate complexity.
> 2. Calling this function within all relevant NumPy functions.
>
>    This affects many parts of the NumPy codebase, although with very low
>    complexity.
>
> Finding and calling the right ``__array_function__``
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Given a NumPy function, ``*args`` and ``**kwargs`` inputs, we need to
> search through ``*args`` and ``**kwargs`` for all appropriate inputs
> that might have the ``__array_function__`` attribute. Then we need to
> select among those possible methods and execute the right one.
> Negotiating between several possible implementations can be complex.
>
> Finding arguments
> '''''''''''''''''
>
> Valid arguments may be directly in the ``*args`` and ``**kwargs``, such
> as in the case for ``np.tensordot(left, right, out=out)``, or they may
> be nested within lists or dictionaries, such as in the case of
> ``np.concatenate([x, y, z])``. This can be problematic for two reasons:
>
> 1. Some functions are given long lists of values, and traversing them
>    might be prohibitively expensive.
> 2. Some functions may have arguments that we don't want to inspect, even
>    if they have the ``__array_function__`` method.
>
> To resolve these issues, NumPy functions should explicitly indicate which
> of their arguments may be overloaded, and how these arguments should be
> checked. As a rule, this should include all arguments documented as either
> ``array_like`` or ``ndarray``.
>
> We propose to do so by writing "dispatcher" functions for each overloaded
> NumPy function:
>
> - These functions will be called with the exact same arguments that were
> passed
>   into the NumPy function (i.e., ``dispatcher(*args, **kwargs)``), and
> should
>   return an iterable of arguments to check for overrides.
> - Dispatcher functions are required to share the exact same positional,
>   optional and keyword-only arguments as their corresponding NumPy
> functions.
>   Otherwise, valid invocations of a NumPy function could result in an
> error when
>   calling its dispatcher.
> - Because default *values* for keyword arguments do not have
>   ``__array_function__`` attributes, by convention we set all default
> argument
>   values to ``None``. This reduces the likelihood of signatures falling out
>   of sync, and minimizes extraneous information in the dispatcher.
>   The only exception should be cases where the argument value in some way
>   effects dispatching, which should be rare.
>
> An example of the dispatcher for ``np.concatenate`` may be instructive:
>
> .. code:: python
>
>     def _concatenate_dispatcher(arrays, axis=None, out=None):
>         for array in arrays:
>             yield array
>         if out is not None:
>             yield out
>
> The concatenate dispatcher is written as generator function, which allows
> it
> to potentially include the value of the optional ``out`` argument without
> needing to create a new sequence with the (potentially long) list of
> objects
> to be concatenated.
>
> Trying ``__array_function__`` methods until the right one works
> '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
>
> Many arguments may implement the ``__array_function__`` protocol. Some
> of these may decide that, given the available inputs, they are unable to
> determine the correct result. How do we call the right one? If several
> are valid then which has precedence?
>
> For the most part, the rules for dispatch with ``__array_function__``
> match those for ``__array_ufunc__`` (see
> `NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_).
> In particular:
>
> -  NumPy will gather implementations of ``__array_function__`` from all
>    specified inputs and call them in order: subclasses before
>    superclasses, and otherwise left to right. Note that in some edge cases
>    involving subclasses, this differs slightly from the
>    `current behavior <https://bugs.python.org/issue30140>`_ of Python.
> -  Implementations of ``__array_function__`` indicate that they can
>    handle the operation by returning any value other than
>    ``NotImplemented``.
> -  If all ``__array_function__`` methods return ``NotImplemented``,
>    NumPy will raise ``TypeError``.
>
> One deviation from the current behavior of ``__array_ufunc__`` is that
> NumPy
> will only call ``__array_function__`` on the *first* argument of each
> unique
> type. This matches Python's
> `rule for calling reflected methods <https://docs.python.org/3/
> reference/datamodel.html#object.__ror__>`_,
> and this ensures that checking overloads has acceptable performance even
> when
> there are a large number of overloaded arguments. To avoid long-term
> divergence
> between these two dispatch protocols, we should
> `also update <https://github.com/numpy/numpy/issues/11306>`_
> ``__array_ufunc__`` to match this behavior.
>
> Special handling of ``numpy.ndarray``
> '''''''''''''''''''''''''''''''''''''
>
> The use cases for subclasses with ``__array_function__`` are the same as
> those
> with ``__array_ufunc__``, so ``numpy.ndarray`` should also define a
> ``__array_function__`` method mirroring ``ndarray.__array_ufunc__``:
>
> .. code:: python
>
>     def __array_function__(self, func, types, args, kwargs):
>         # Cannot handle items that have __array_function__ other than our
> own.
>         for t in types:
>             if (hasattr(t, '__array_function__') and
>                     t.__array_function__ is not
> ndarray.__array_function__):
>                 return NotImplemented
>
>         # Arguments contain no overrides, so we can safely call the
>         # overloaded function again.
>         return func(*args, **kwargs)
>
> To avoid infinite recursion, the dispatch rules for ``__array_function__``
> need
> also the same special case they have for ``__array_ufunc__``: any
> arguments with
> an ``__array_function__`` method that is identical to
> ``numpy.ndarray.__array_function__`` are not be called as
> ``__array_function__`` implementations.
>
> Changes within NumPy functions
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Given a function defining the above behavior, for now call it
> ``try_array_function_override``, we now need to call that function from
> within every relevant NumPy function. This is a pervasive change, but of
> fairly simple and innocuous code that should complete quickly and
> without effect if no arguments implement the ``__array_function__``
> protocol.
>
> In most cases, these functions should written using the
> ``array_function_dispatch`` decorator, which also associates dispatcher
> functions:
>
> .. code:: python
>
>     def array_function_dispatch(dispatcher):
>         """Wrap a function for dispatch with the __array_function__
> protocol."""
>         def decorator(func):
>             @functools.wraps(func)
>             def new_func(*args, **kwargs):
>                 relevant_arguments = dispatcher(*args, **kwargs)
>                 success, value = try_array_function_override(
>                     new_func, relevant_arguments, args, kwargs)
>                 if success:
>                     return value
>                 return func(*args, **kwargs)
>             return new_func
>         return decorator
>
>     # example usage
>     def _broadcast_to_dispatcher(array, shape, subok=None,
> **ignored_kwargs):
>         return (array,)
>
>     @array_function_dispatch(_broadcast_to_dispatcher)
>     def broadcast_to(array, shape, subok=False):
>         ...  # existing definition of np.broadcast_to
>
> Using a decorator is great! We don't need to change the definitions of
> existing NumPy functions, and only need to write a few additional lines
> for the dispatcher function. We could even reuse a single dispatcher for
> families of functions with the same signature (e.g., ``sum`` and ``prod``).
> For such functions, the largest change could be adding a few lines to the
> docstring to note which arguments are checked for overloads.
>
> It's particularly worth calling out the decorator's use of
> ``functools.wraps``:
>
> - This ensures that the wrapped function has the same name and docstring as
>   the wrapped NumPy function.
> - On Python 3, it also ensures that the decorator function copies the
> original
>   function signature, which is important for introspection based tools
> such as
>   auto-complete. If we care about preserving function signatures on Python
> 2,
>   for the `short while longer <http://www.numpy.org/neps/
> nep-0014-dropping-python2.7-proposal.html>`_
>   that NumPy supports Python 2.7, we do could do so by adding a vendored
>   dependency on the (single-file, BSD licensed)
>   `decorator library <https://github.com/micheles/decorator>`_.
> - Finally, it ensures that the wrapped function
>   `can be pickled <http://gael-varoquaux.info/programming/decoration-in-
> python-done-right-decorating-and-pickling.html>`_.
>
> In a few cases, it would not make sense to use the
> ``array_function_dispatch``
> decorator directly, but override implementation in terms of
> ``try_array_function_override`` should still be straightforward.
>
> - Functions written entirely in C (e.g., ``np.concatenate``) can't use
>   decorators, but they could still use a C equivalent of
>   ``try_array_function_override``. If performance is not a concern, they
> could
>   also be easily wrapped with a small Python wrapper.
> - The ``__call__`` method of ``np.vectorize`` can't be decorated with
> <p style="margin:0px;font-stretch:normal;font-size:17.4px;line-
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> I would like to propose that we use `__array_function` in the following
> manner for functions that create arrays:
>
>    - `array_reference` for indicating the “reference array” whose
>    `__array_function__` implementation will be called. For example,
>    `np.arange(5, array_reference=some_dask_array)`.
>    - I use a reference in the design rather than a type because for some
>    arrays (such as Dask), chunk sizes or other reference data is needed to
>    make this work.
>
>
> I realise that this is a big design decision, so I welcome any input!
>
> Best Regards,
> Hameer Abbasi
> Sent from Astro <https://www.helloastro.com> for Mac
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180627/d45cac21/attachment-0001.html>