[Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

Sat Sep 7 16:33:35 EDT 2019

On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote:
> >
> >
> <snip>
>
> > > That's part of it. The concrete problems it's solving are
> > > threefold:
> > > Array creation functions can be overridden.
> > > Array coercion is now covered.
> > > "Default implementations" will allow you to re-write your NumPy
> > > array more easily, when such efficient implementations exist in
> > > terms of other NumPy functions. That will also help achieve similar
> > > semantics, but as I said, they're just "default"...
> > >
> >
> > There may be another very concrete one (that's not yet in the NEP):
> > allowing other libraries that consume ndarrays to use overrides. An
> > example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch
> > NumPy, something we don't like all that much (in particular for
> > mkl_fft, because it's the default in Anaconda). `__array_function__`
> > isn't able to help here, because it will always choose NumPy's own
> > implementation for ndarray input. With unumpy you can support
> > multiple libraries that consume ndarrays.
> >
> > Another example is einsum: if you want to use opt_einsum for all
> > inputs (including ndarrays), then you cannot use np.einsum. And yet
> > another is using bottleneck (
> > https://kwgoodman.github.io/bottleneck-doc/reference.html) for nan-
> > functions and partition. There's likely more of these.
> >
> > The point is: sometimes the array protocols are preferred (e.g.
> > Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch works
> > better. It's also not necessarily an either or, they can be
> > complementary.
> >
>
> Let me try to move the discussion from the github issue here (this may
> not be the best place). (https://github.com/numpy/numpy/issues/14441
> which asked for easier creation functions together with
> `__array_function__`).
>
> I think an important note mentioned here is how users interact with
> unumpy, vs. __array_function__. The former is an explicit opt-in, while
> the latter is implicit choice based on an `array-like` abstract base
> class and functional type based dispatching.
>
> To quote NEP 18 on this: "The downsides are that this would require an
> explicit opt-in from all existing code, e.g., import numpy.api as np,
> and in the long term would result in the maintenance of two separate
> NumPy APIs. Also, many functions from numpy itself are already
> overloaded (but inadequately), so confusion about high vs. low level
> APIs in NumPy would still persist."
> (I do think this is a point we should not just ignore, `uarray` is a
> thin layer, but it has a big surface area)
>
> Now there are things where explicit opt-in is obvious. And the FFT
> example is one of those, there is no way to implicitly choose another
> backend (except by just replacing it, i.e. monkeypatching) [1]. And
> right now I think these are _very_ different.
>
>
> Now for the end-users choosing one array-like over another, seems nicer
> as an implicit mechanism (why should I not mix sparse, dask and numpy
> arrays!?). This is the promise `__array_function__` tries to make.
> Unless convinced otherwise, my guess is that most library authors would
> strive for implicit support (i.e. sklearn, skimage, scipy).
>
> Circling back to creation and coercion. In a purely Object type system,
> these would be classmethods, I guess, but in NumPy and the libraries
> above, we are lost.
>
> Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31)
>   * Required end-user opt-in.
>
  * Seems cleaner in many ways
>   * Requires a full copy of the API.
>

bullet 1 and 3 are not required. if we decide to make it default, then
there's no separate namespace

> Solution 2: Add some coercion "protocol" (NEP-30) and expose a way to
> create new arrays more conveniently. This would practically mean adding
> an `array_type=np.ndarray` argument.
>   * _Not_ used by end-users! End users should use dask.linspace!
>   * Adds "strange" API somewhere in numpy, and possible a new
>     "protocol" (additionally to coercion).[2]
>
> I still feel these solve different issues. The second one is intended
> to make array likes work implicitly in libraries (without end users
> having to do anything). While the first seems to force the end user to
> opt in, sometimes unnecessarily:
>
> def my_library_func(array_like):
>    exp = np.exp(array_like)
>    idx = np.arange(len(exp))
>    return idx, exp
>
> Would have all the information for implicit opt-in/Array-like support,
> but cannot do it right now.

Can you explain this a bit more? `len(exp)` is a number, so
`np.arange(number)` doesn't really have any information here.

> This is what I have been wondering, if
> uarray/unumpy, can in some way help me make this work (even _without_
> the end user opting in).

good question. if that needs to work in the absence of the user doing
anything, it should be something like

with unumpy.determine_backend(exp):
   unumpy.arange(len(exp))   # or np.arange if we make unumpy default

to get the equivalent to `np.arange_like(len(exp), array_type=exp)`.

Note, that `determine_backend` thing doesn't exist today.

The reason is that simply, right now I am very
> clear on the need for this use case, but not sure about the need for
> end user opt in, since end users can just use dask.arange().
>

I don't get the last part. The arange is inside a library function, so a
user can't just go in and change things there.

Cheers,
Ralf

>
> Cheers,
>
> Sebastian
>
>
> [1] To be honest, I do think a lot of the "issues" around
> monkeypatching exists just as much with backend choosing, the main
> difference seems to me that a lot of that:
>    1. monkeypatching was not done explicit
>       (import mkl_fft; mkl_fft.monkeypatch_numpy())?
>    2. A backend system allows libaries to prefer one locally?
>       (which I think is a big advantage)
>
> [2] There are the options of adding `linspace_like` functions somewhere
> in a numpy submodule, or adding `linspace(..., array_type=np.ndarray)`,
> or simply inventing a new "protocl" (which is not really a protocol?),
> and make it `ndarray.__numpy_like_creation_functions__.arange()`.
>
>
>
> > Actually, after writing this I just realized something. With 1.17.x
> > we have:
> >
> > ```
> > In [1]: import dask.array as da
> >
> >
> > In [2]: d = da.from_array(np.linspace(0, 1))
> >
> >
> > In [3]: np.fft.fft(d)
> >
> > Out[3]: dask.array<fft, shape=(50,), dtype=complex128,
> > chunksize=(50,)>
> > ```
> >
> > In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this won't
> > work. We have no bug report yet because 1.17.x hasn't landed in conda
> > defaults yet (perhaps this is a/the reason why?), but it will be a
> > problem.
> >
> > > The import numpy.overridable part is meant to help garner adoption,
> > > and to prefer the unumpy module if it is available (which will
> > > continue to be developed separately). That way it isn't so tightly
> > > coupled to the release cycle. One alternative Sebastian Berg
> > > mentioned (and I am on board with) is just moving unumpy into the
> > > NumPy organisation. What we fear keeping it separate is that the
> > > simple act of a pip install unumpy will keep people from using it
> > > or trying it out.
> > >
> > Note that this is not the most critical aspect. I pushed for
> > vendoring as numpy.overridable because I want to not derail the
> > comparison with NEP 30 et al. with a "should we add a dependency"
> > discussion. The interesting part to decide on first is: do we need
> > the unumpy override mechanism? Vendoring opt-in vs. making it default
> > vs. adding a dependency is of secondary interest right now.
> >
> > Cheers,
> > Ralf
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190907/866a45c4/attachment-0001.html>