[Numpy-discussion] asarray/anyarray; matrix/subclass

Sat Nov 10 17:20:46 EST 2018

> On Saturday, Nov 10, 2018 at 9:16 PM, Stephan Hoyer <shoyer at gmail.com (mailto:shoyer at gmail.com)> wrote:
> On Sat, Nov 10, 2018 at 9:49 AM Marten van Kerkwijk <m.h.vankerkwijk at gmail.com (mailto:m.h.vankerkwijk at gmail.com)> wrote:
> > Hi Hameer,
> >
> > I do not think we should change `asanyarray` itself to special-case matrix; rather, we could start converting `asarray` to `asanyarray` and solve the problems that produces for matrices in `matrix` itself (e.g., by overriding the relevant function with `__array_function__`).
> >
> > I think the idea of providing an `__anyarray__` method (in analogy with `__array__`) might work. Indeed, the default in `ndarray` (and thus all its subclasses) could be to let it return `self` and to override it for `matrix` to return an ndarray view.
>
> Yes, we certainly would rather implement a matrix.__anyarray__ method (if we're already doing a new protocol) rather than special case np.matrix explicitly.
>
> Unfortunately, per Nathaniel's comments about NA skipping behavior, it seems like we will also need MaskedArray.__anyarray__ to return something other than itself. In principle, we should probably write new version of MaskedArray that doesn't deviate from ndarray semantics, but that's a rather large project (we'd also probably want to stop subclassing ndarray).
>
> Changing the default aggregation behavior for the existing MaskedArray is also an option but that would be a serious annoyance to users and backwards compatibility break. If the only way MaskedArray violates Liskov is in terms of NA skipping aggregations by default, then this might be viable. In practice, this would require adding an explicit skipna argument so FutureWarnings could be silenced. The plus side of this option is that it would make it easier to use np.anyarray() or any new coercion function throughout the internal NumPy code base.
>
> To summarize, I think these are our options:
> 1. Change the behavior of np.anyarray() to check for an __anyarray__() protocol. Change np.matrix.__anyarray__() to return a base numpy array (this is a minor backwards compatibility break, but probably for the best). Start issuing a FutureWarning for any MaskedArray operations that violate Liskov and add a skipna argument that in the future will default to skipna=False.
>
>
>
>
>

> 2. Introduce a new coercion function, e.g., np.duckarray(). This is the easiest option because we don't need to cleanup NumPy's existing ndarray subclasses.
>
>
>
>
>

My vote is still for 1. I don’t have an issue for PyData/Sparse depending on recent-ish NumPy versions — It’ll need a lot of the recent protocols anyway, although I could be convinced otherwise if major package devs (scikits, SciPy, Dask) were to weigh in and say they’ll jump on it (which seems unlikely given SciPy’s policy to support old NumPy versions).

>
>
> P.S. I'm just glad pandas stopped subclassing ndarray a while ago -- there's no way pandas.Series() could be fixed up to not violate Liskov :). _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20181110/2e91759a/attachment-0001.html>