[Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules

Stephan Hoyer shoyer at gmail.com
Sun Feb 23 18:30:54 EST 2020


On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> > It is less clear how this could work for __array_module__, because
>
> __array_module__ and get_array_module() are not generic -- they
> > refers explicitly to a NumPy like module. If we want to extend it to
> > SciPy (for which I agree there are good use-cases), what should that
> > look __array_module__`
>
> I suppose the question is here, where should the code reside? For
> SciPy, I agree there is a good reason why you may want to "reverse" the
> implementation. The code to support JAX arrays, should live inside JAX.
>
> One, probably silly, option is to return a "global" namespace, so that:
>
>     np = get_array_module(*arrays).numpy`
>
>
My main concern with a "global namespace" is that it adds boilerplate to
the typical usage of fetching a duck-array version of NumPy.

I think the simplest proposal is to add a "module" argument to both
get_array_module and __array_module__, with a default value of "numpy".
This adds flexibility with minimal additional complexity.

The main question is what the type of arguments for "module" should be:
1. Modules could be specified as strings, e.g., "numpy"
2. Module could be specified as actual namespace, e.g., numpy from import
numpy.

The advantage of (1) is that in theory you could write
np.get_array_module(*arrays, module='scipy.linalg') without the overhead of
actually importing scipy.linalg or without even needing scipy to be
installed, if all the arrays use a different scipy.linalg implementation.
But in practice, this seems a little far-fetched. All alternative
implementations of scipy that I know of (e.g., in JAX or conceivably in
Dask) import the original library.

The main downside of (1) is that it would would mean that NumPy's
ndarray.__array_module__ would need to use importlib.import_module() to
dynamically import modules. It also adds a potentially awkward asymmetry
between the "module" and "default" arguments, unless we also switched
default to specify modules with strings.

Either way, the "default" argument will probably need to be adjusted so
that by default it matches whatever value is passed into "module", instead
of always defaulting to "numpy".

Any thoughts on which of these options makes most sense? We could also put
off making any changes to the protocol now, but this change seems pretty
safe and appear to have real use-cases (e.g., for sklearn) so I am inclined
to go ahead with it now before finalizing the NEP.


> We have to distinct issues: Where should e.g. SciPy put a generic
> implementation (assuming they to provide implementations that only
> require NumPy-API support to not require overriding)?
> And, also if a library provides generic support, should we define a
> standard of how the context/namespace may be passed in/provided?
>
> sklearn's main namespace is expected to support many array
> objects/types, but it could be nice to pass in an already known
> context/namespace (say scikit-image already found it, and then calls
> scikit-learn internally). A "generic" namespace may even require this
> to infer the correct output array object.
>
>
> Another thing about backward compatibility: What is our vision there
> actually?
> This NEP will *not* give the *end user* the option to opt-in! Here,
> opt-in is really reserved to the *library user* (e.g. sklearn). (I did
> not realize this clearly before)
>
> Thinking about that for a bit now, that seems like the right choice.
> But it also means that the library requires an easy way of giving a
> FutureWarning, to notify the end-user of the upcoming change. The end-
> user will easily be able to convert to a NumPy array to keep the old
> behaviour.
> Once this warning is given (maybe during `get_array_module()`, the
> array module object/context would preferably be passed around,
> hopefully even between libraries. That provides a reasonable way to
> opt-in to the new behaviour without a warning (mainly for library
> users, end-users can silence the warning if they wish so).
>

I don't think NumPy needs to do anything about warnings. It is
straightforward for libraries that want to use use get_array_module() to
issue their own warnings before calling get_array_module(), if desired.

Or alternatively, if a library is about to add a new __array_module__
method, it is straightforward to issue a warning inside the new
__array_module__ method before returning the NumPy functions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200223/70f60b1c/attachment-0001.html>


More information about the NumPy-Discussion mailing list