[Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

Nathaniel Smith njs at pobox.com
Thu Apr 25 16:50:16 EDT 2019


On Thu, Apr 25, 2019 at 10:10 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>
> On Wed, Apr 24, 2019 at 9:56 PM Nathaniel Smith <njs at pobox.com> wrote:
>>
>> When you say "numpy array specific" and
>> "__numpy_(nd)array_implementation__", that sounds to me like you're
>> trying to say "just step 3, skipping steps 1 and 2"? Step 3 is the one
>> that operates on ndarrays...
>
> My thinking was that if we implement NumPy functions with duck typing (e.g., `np.stack()` in terms of  `.shape` + `np.concatenate()`), then step (3) could in some sense be the generic "array implementation", not only for NumPy arrays.

Okay right, so roughly speaking there are two different types of
functions that support __array_function__:

* "Core" numpy functions that typically do implicit coercion and then
iterate over raw memory
* "Derived" functions, the kind of thing that could just as well be
implemented in another library or end-user code, and often are... but
since these ones happen to be in the numpy package namespace, they
support __array_function__.

There are probably some weird cases that don't fall neatly into either
category, but I think the distinction is at least useful for
organizing our thoughts.

>>
>> When we have some kind of __asduckarray__ coercion, then that will
>> complicate things too, because presumably we'll do something like
>>
>> 1. __array_function__ dispatch
>> 2. __asduckarray__ coercion
>> 3. __array_function__ dispatch again
>> 4. ndarray coercion
>> 5. [either "the implementation", or __array_function__ dispatch again,
>> depending on how you want to think about it]
>
>
> I was thinking of something a little simpler: do __asduckarray__ rather than numpy.ndarray coercion inside the implementation of NumPy functions. Then making use of NumPy's implementations would be a matter of calling the NumPy implementation without ndarray coercion from side __array_function__.
>
> e.g.,
>
> class MyArray:
>     def __duck_array__(self):
>         return self
>     def __array_function__(self, func, types, args, kwargs):
>         ...
>         if func in {np.stack, np.atleast_1d, ...}:
>             # use NumPy's "duck typing" implementations for these functions
>             return func.__duck_array_implementation__(*args, **kwargs)
>         elif func == np.concatenate:
>             # write my own version of np.concatenate
>             ...
>
> This would let you make use of duck typing in a controlled way if you use __array_function__. np.stack.__duck_array_implementation__ would look exactly like np.stack, except np.asanyarray() would be replaced by np.asduckarray().
>
> The reason why we need the separate __duck_array_implementation__ and __numpy_array_implementation__/__skipping_array_function__ is because there are also use cases where you *don't* want to worry about how np.stack is implemented under the hood (i.e., in terms of np.concatenate), and want to go straight to the coercive numpy.ndarray implementation. This lets you avoid both the complexity and overhead associated with further dispatch checks.
>
> I don't think we want repeated dispatching with __array_function__. That seems like a recipe for slow performance and confusion.

I don't understand this part, but it makes me worry that instead of
designing something that fits together based on some underlying
logical framework, you're hoping to just keep throwing more and more
hooks at things and hoping that if 3rd party libraries have enough
hooks they'll be able to somehow monkeypatch things into working most
of the time if you don't look too hard :-/. I hope that's wrong.

Stepping back a bit:

My objection to the phrase "numpy implemention" has been that
"implementation" is one of those words like "low level", whose meaning
completely changes depending on which part of the system you happen to
be thinking about when you say it. I think I see what you're getting
at now, though; you've been working on adding __array_function__
dispatch, and from the perspective of a wrapper function implementing
__array_function__ dispatch, there's a clear distinction between the
caller, the dispatch, and then the fallback "implementation" that it
delegates to if no __array_function__ methods were found. The wrapper
treats the fallback function like a black box.

That's an internally consistent approach, and if you want
__array_function__ to work on "derived" functions like np.stack...
well, they're just arbitrary Python functions, so you *have* to treat
the fallback like a black box, and __array_function__ dispatch as a
cleanly decoupled step.

And if that's the model for __array_function__, then it makes perfect
sense to talk about skipping the __array_function__ dispatch step. I
think the word "implementation" is too vague, but the idea makes
sense.

The thing I didn't realize until these last few posts, though, is that
if this is the model for __array_function__, then it means you *have*
to treat the fallback as a black box. Which means that
__array_function__ cannot be integrated into numpy's coercion rules,
which are inside the black box. And duck arrays need to be integrated
into numpy's coercion rules, because you have to be able to coerce to
a duck array before calling whatever special methods it has.  So
therefore... duck arrays cannot use __array_function__? That seems
like an unfortunate conclusion but I don't see any way around it.
Like, for a concrete example: if obj1 has an __asduckarray__ method,
and that returns obj2 with __array_ufunc__, then I would absolutely
expect np.sin(obj1) to end up calling obj2.__array_ufunc__. But if
__array_function__ is a decoupled step applicable to arbitrary
functions, then np.sin(obj1) can't call obj2.__array_function__.

Alternatively, we could make __array_function__ part of numpy's
standard coercion/dispatch sequence, but then it doesn't make much
sense for np.stack to do __array_function__ dispatch.

I guess this is just another manifestion trade-off we accepted when we
decided to implement __array_function__, instead of more
finer-grained, semantically-integrated hooks like
__array_concatenate__, and I shouldn't expect __array_function__ to be
useful for duck arrays?

I don't have a conclusion but I'd like to know what you think about
the above :-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the NumPy-Discussion mailing list