[Numpy-discussion] NEP: Dispatch Mechanism for NumPy’s high level API

Marten van Kerkwijk m.h.vankerkwijk at gmail.com
Sun Jun 3 19:23:58 EDT 2018


>
> In most cases, I suspect that the overhead of a function call and checking
> several arguments for "__array_function__" will be negligible, like the
> situation for __array_ufunc__. I'm not strongly opposed to either of your
> proposed solutions, but I do think it would be a little strange to insist
> that we need a solution for __array_function__ when __array_ufunc__ was
> fine.
>

Ufuncs actually do try to speed-up array checks - but indeed the same can
(and should) be done for `__array_ufunc__`. They also do have `subok`. This
currently ignored but that is mostly because looking for it in `kwargs` is
so damn slow!

Anyway, my main point was that it should be explicitly mentioned as a
constraint that for pure ndarray input, things should be really fast.


>
> A. Two "namespaces", one for the undecorated base functions, and one
>> completely trivial one for the decorated ones. The idea would be that if
>> one knows one is dealing with arrays only, one would do `import
>> numpy.array_only as np` (i.e., the reverse of the suggestion currently in
>> the NEP, where the decorated ones are in their own namespace - I agree with
>> the reasons for discounting that one).
>>
>
> I will mention this as a possibility.
>
> I do think there is something to be said for clear separation of
> overloaded and non-overloaded APIs. But f I were to choose between adding
> numpy.api and numpy.array_only, I would pick numpy.api, because of the
> virtue of preserving the existing numpy namespace as it currently exists.
>

Good point. Overall, the separate namespaces probably is not the way to do.


>
> B. Automatic insertion by the decorator of an `array_only=np._NoValue` (or
>> `coerce` and perhaps `subok=...` if not present) in the function signature,
>> so that users who know that they have arrays only could pass
>> `array_only=True` (name to be decided).
>>
>
> Rather than adding another argument to every NumPy function, I would
> rather encourage writing np.asarray() explicitly.
>

Good point - just as good as long as the check for all-array is very fast
(which it should be - `arg.__class__ is np.ndarray` is fast!).


> Note that both A and B could also address, at least partially, the problem
>> of sometimes wanting to just use the old coercion methods, i.e., not having
>> to implement every possible numpy function in one go in a new
>> `__array_function__` on one's class.
>>
>
> Yes, agreed.
>
>
>> 1. I'm rather unclear about the use of `types`. It can help me decide
>> what to do, but I would still have to find the argument in question (e.g.,
>> for Quantity, the unit of the relevant argument). I'd recommend passing
>> instead a tuple of all arguments that were inspected, in the inspection
>> order; after all, it is just a `arg.__class__` away from the type, and in
>> your example you'd only have to replace `issubclass` by `isinstance`.
>>
>
> The virtue of a `types` argument is that we can deduplicate arguments
> once, rather than in each __array_function__ check. This could result in
> significantly more efficient code, e.g,. when np.concatenate() is called on
> 10,000 arrays with only two unique types, we don't need to loop through all
> 10,000 again objects to check that overloading is valid.
>

I think one might still want to know *where* the type occurs (e.g., as an
output or index would have different implications). Possibly, a solution
would rely on the same structure as used for the "dance". But as a general
point, I don't see the advantage of passing types rather than arguments -
less information for no benefit.


> Even for Quantity, I suspect you will want two layers of checks:
> 1. A check to verify that every argument is a Quantity (or something
> coercible to a Quantity). This could use `types` and return
> `NotImplemented` when it fails.
> 2. A check to verify that units match. This will have custom logic for
> different operations and will require checking all arguments -- not just
> their unique types.
>

Not sure. With, Quantity I generally do not worry about other types, but
rather look at units attributes, assume anything without is dimensionless,
cast Quantity to array with the right unit, and then defer to `ndarray`.


> For many Quantity functions, the second check will indeed probably be
> super simple (i.e., verifying that all units match). But the first check
> (with `types`) really is something that basically very overload should do.
>
>
>> 2. For subclasses, it would be very handy to have
>> `ndarray.__array_function__`, so one can call super after changing
>> arguments. (For `__array_ufunc__`, there was lots of question about whether
>> this was useful, but it really is!!). [I think you already agreed with
>> this, but want to have it in-place, as for subclasses of ndarray this is
>> just as useful as it would be for subclasses of dask arrays.)
>>
>
> Yes, indeed.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/55f6e218/attachment.html>


More information about the NumPy-Discussion mailing list