[Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

Fri Apr 26 04:03:50 EDT 2019

On Fri, Apr 26, 2019 at 1:02 AM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Thu, Apr 25, 2019 at 3:39 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>> On Fri, Apr 26, 2019 at 12:04 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>>
>>> I do like the look of this, but keep in mind that there is a downside to
>>> exposing the implementation of NumPy functions -- now the implementation
>>> details become part of NumPy's API. I suspect we do not want to commit
>>> ourselves to never changing the implementation of NumPy functions, so at
>>> the least this will need careful disclaimers about non-guarantees of
>>> backwards compatibility.
>>>
>>
>> I honestly still am missing the point of claiming this. There is no
>> change either way to what we've done for the last decade. If we change
>> anything in the numpy implementation of any function, we use deprecation
>> warnings etc. What am I missing here?
>>
>
> Hypothetically, wuppose we rewrite np.stack() in terms of np.block()
> instead of np.concatenate(), because it turns out it is faster.
>
> As long as we've coercing with np.asarray(), users don't notice any
> material difference -- their code just gets a little faster.
>
> But this could be problematic if we support duck typing. For example, I
> support dask arrays rely on NumPy's definition of np.stack in terms of
> np.concatenate, but they never bothered to implement np.block. Now
> upgrading NumPy breaks dask.
>

Thanks, this helped clarify what's going on here. This example is clear.
The problem seems to be that there's two separate discussions in this
thread:
1. your original proposal, __numpy_implementation__. it does not have the
problem of your np.concatenate example, as the "numpy implementation" is
exactly the same as it is today.
2. splitting up the current numpy implementation into *multiple* entry
points. this can be with and without coercion, with and without checking
for invalid values etc.

So far NEP 18 does (1). Your proposed __numpy_implementation__ addition to
NEP 18 is still (1). Claiming that this affects the situation with respect
to backwards compatibility is incorrect.

(2) is actually a much more invasive change, and one that does much more to
increase the size of the NumPy API surface. And yes, affects our backwards
compatibility situation as well.

Also note that these have very different purposes:
(1) was to (quoting from the NEP) "allow using NumPy as a high level API
for efficient multi-dimensional array operations, even with array
implementations that differ greatly from numpy.ndarray."
(2) is for making duck arrays work with numpy implementations of functions
(not just with the NumPy API)

I think (1) is mostly achieved, and I'm +1 on your NEP addition for that.
(2) is quickly becoming a mess, and I agree with Nathaniel's sentiment
above "I shouldn't expect __array_function__ to be useful for duck
arrays?". For (2) we really need to go back and have a well thought out
design. Hameer's mention of uarray could be that. Growing more __array_*__
protocols in a band-aid fashion seems unlikely to get us there.

> This is basically the same reason why subclass support has been hard to
> maintain in NumPy. Apparently safe internal changes to NumPy functions can
> break other array types in surprising ways, even if they do not
> intentionally deviate from NumPy's semantics.
>

Agreed. Therefore optionally skipping asarray & co is a separate
discussion. That's part of the problem caused by numpy trying to be both a
library and an end user interface - and often those goals conflict.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190426/d083fd4c/attachment-0001.html>