[Numpy-discussion] is __array_ufunc__ ready for prime-time?

josef.pktd at gmail.com josef.pktd at gmail.com
Thu Nov 2 17:51:57 EDT 2017


On Thu, Nov 2, 2017 at 5:09 PM, Benjamin Root <ben.v.root at gmail.com> wrote:

> Duck typing is great and all for classes that implement some or all of the
> ndarray interface.... but remember what the main reason for asarray() and
> asanyarray(): to automatically promote lists and tuples and other
> "array-likes" to ndarrays. Ignoring the use-case of lists of lists is
> problematic at best.
>
> Ben Root
>
>
> On Thu, Nov 2, 2017 at 5:05 PM, Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
>
>> My 2¢ here is that all code should feel free to assume certain type of
>> input, as long as it is documented properly, but there is no reason to
>> enforce that by, e.g., putting `asarray` everywhere. Then, for some
>> pieces ducktypes and subclasses will just work like magic, and uses
>> you might never have foreseen become possible. For others, whoever
>> wants to use them has to do work (and up to a package maintainers to
>> decide whether or not to accept PRs that implement hooks, etc.)
>>
>> I do see the argument that this way one becomes constrained in the
>> internal implementation, as a change may break an outward-looking
>> function, but while at times this may be inconvenient, in my
>> experience at others it may just make one realize an even better
>> implementation is possible. But then, I really like duck-typing...
>>
>
One problem in general is that there is no protocol about what operations
are implemented in a numpy ndarray equivalent way in those ducks, i.e. if
they quack in a compatible way.

One small example, pandas standard deviation, std, used by default ddof=1,
and didn't have an option to override it instead of using ddof=0 that numpy
uses. So even though we could call a std method of the ducks, the t-test
results would be a bit different and visibly different in small samples
depending on the type of the data. A possible alternative would be to
compute std from scratch and forgo the available function or method.

I tried once in the scipy.zscore function to be agnostic about the type and
not use asarray, it's a simple operation but still it required special
handling of numpy matrices because it preserves the dimension in reduce
operations. After more than a few lines it is difficult to keep track of
what type is no used.

Another subclass that is often broken in default code are masked arrays
because asarray throws away the mask.
But asanyarray wouldn't work always either because the mask needs code for
handling the masked values. For example scipy.stats ended up with separate
functions for masked arrays.

Josef



>
>> -- Marten
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20171102/0b0ad1c4/attachment.html>


More information about the NumPy-Discussion mailing list