[Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

Sebastian Berg sebastian at sipsolutions.net
Fri Apr 24 14:23:35 EDT 2020


On Fri, 2020-04-24 at 10:12 -0700, Stephan Hoyer wrote:
> On Fri, Apr 24, 2020 at 6:31 AM Sebastian Berg <
> sebastian at sipsolutions.net>
> wrote:
> 
> > One thing to note is that `__array__` is actually asked to return a
> > copy AFAIK.
> 
> The documentation on __array__ seems to quite limited, unfortunately.
> The
> most I can find are a few sentences here:
> https://numpy.org/doc/stable/reference/arrays.classes.html#numpy.class.__array__
> 
> I don't see anything about returning copies. My interpretation has
> always
> been that __array__ can return either a copy or a view, like the
> np.asarray() constructor.
> 

Hmmm, right, I am not quite sure why I thought this was the case.

The more important part is behaviour. And the fact is that if you do
`np.array(array_like)` with an array like that implements `__array__`
then we ensure a copy is made by default (`copy=True` by default), even
though `__array__()` may already return a copy.

In any case, the current default for `np.asarray`, i.e. `copy=False` is
"copy if necessary". So if PyTorch uses a new parameter to Opt-In to
copying, the default behaviour will depend on the object. The
definition would then be:

    Copy if necessary but error if a copy is necessary and the
    object doesn't want to be copied silently.

To be honest, that seems not totally terrible to me... The old
statement remains true with the small caveat that it will sometimes
cause a loud error explaining things. The only problem is that some
users may want an the explicit `np.copy_if_necessary` to get PyTorch to
do what most already do on `copy=False`.

I guess the new behaviour would then be:

if copy is np.never_copy:  # or however we signal it
    try:
        arr = obj.__array__(copy=np.no_copy)
    except TypeError as e:
        raise TypeError("no copy appears unsupported by ...!") from e
elif copy is np.copy_if_necessary:
    # Some users may want to tell PyTorch not to error, but
    # tell pandas, that a view is OK:
    try:
        arr = np.array(copy=np.copy_if_necessary)
    except TypeError:
        arr = obj.__array__()
elif not copy:
    # Behaviour here may depend on the array-like!
    # current array likes may or may not return a copy,
    # new ones may choose to raise an error when a view
    # is not possible.
    arr = obj.__array__()
else:
    try:
        arr = obj.__array__(copy=True)
    except TypeError:
        arr = obj.__array__()
        arr = arr.copy()  # make sure its a copy

PyTorch can then implement copy, but raise an error if `copy=False`
(which must be the default). Current objects will error for
`np.never_copy` but otherwise be fine. And they can implement `copy` to
avoid an unnecessary double copy if they wish so.
We could add the `np.copy_if_necessary` to be an explicit replacement
for the current `copy=False`. This will be necessary, or nicer, unless
everyone is happy to copy by default.

Another side note: calls such as `np.array([arr1, arr2])` probably must
always fail if `copy=np.never_copy` since a view is not guaranteed.

- Sebastian


> 
> > I doubt it always does, but if it does not I assume the
> > object should and could provide `__array_interface__`.
> > 
> 
> Objects like xarray.DataArray and pandas.Series sometimes directly
> wrap
> NumPy arrays and sometimes don't.
> 
> They both implement __array__ but not __array_inferace__. It's very
> obvious
> how to implement a "forwarding" __array__ method (just call
> `np.asarray()`
> on an argument that might implement it). I guess something similar
> could be
> done for __array_interface__, but it's not clear to me that it's
> right to
> implement __array_interface__ when doing so might require a copy.
> 

Yes, I do not think you should implement __array_interface__ then,
unless "simplifying the array" is for some reason beneficial for
yourself. I suppose you could raise an AttributeError, but it is
questionable if thats good.


> 
> > Under that assumption, it would be an opt-out right now since NumPy
> > allows copies by default here.
> > Defining things along copy does seem sensible, though I do not know
> > how
> > it would play with some of the current array-likes choosing to
> > refuse
> > `__array__`.
> > 
> > - Sebastian
> > 
> > 
> > 
> > > Eric
> > > 
> > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > jni at fastmail.com>
> > > wrote:
> > > 
> > > > Hi everyone,
> > > > 
> > > > One bit of expressivity we would miss is “copy if necessary,
> > > > but
> > > > otherwise
> > > > > don’t bother”, but there are workarounds to this.
> > > > > 
> > > > 
> > > > After a side discussion with Stéfan van der Walt, we came up
> > > > with
> > > > `allow_copy=True`, which would express to the downstream
> > > > library
> > > > that we
> > > > don’t mind waiting, but that zero-copy would also be ok.
> > > > 
> > > > This sounds like the sort of thing that is use case driven. If
> > > > enough
> > > > projects want to use it, then I have no objections to adding
> > > > the
> > > > keyword.
> > > > OTOH, we need to be careful about adding too many
> > > > interoperability
> > > > tricks
> > > > as they complicate the code and makes it hard for folks to
> > > > determine the
> > > > best solution. Interoperability is a hot topic and we need to
> > > > be
> > > > careful
> > > > not put too leave behind too many experiments in the NumPy
> > > > code.  Do you
> > > > have any other ideas of how to achieve the same effect?
> > > > 
> > > > 
> > > > Personally, I don’t have any other ideas, but would be happy to
> > > > hear some!
> > > > 
> > > > My view regarding API/experiment creep is that `__array__` is
> > > > the
> > > > oldest
> > > > and most basic of all the interop tricks and that this can be
> > > > safely
> > > > maintained for future generations. Currently it only takes
> > > > `dtype=`
> > > > as a
> > > > keyword argument, so it is a very lean API. I think this
> > > > particular
> > > > use
> > > > case is very natural and I’ve encountered the reluctance to
> > > > implicitly copy
> > > > twice, so I expect it is reasonably common.
> > > > 
> > > > Regarding difficulty in determining the best solution, I would
> > > > be
> > > > happy to
> > > > contribute to the dispatch basics guide together with the new
> > > > kwarg. I
> > > > agree that the protocols are getting quite numerous and I
> > > > couldn’t
> > > > find a
> > > > single place that gathers all the best practices together. But,
> > > > to
> > > > reiterate my point: `__array__` is the simplest of these and I
> > > > think this
> > > > keyword is pretty safe to add.
> > > > 
> > > > For ease of discussion, here are the API options discussed so
> > > > far,
> > > > as well
> > > > as a few extra that I don’t like but might trigger other ideas:
> > > > 
> > > > np.asarray(my_duck_array, allow_copy=True)  # default is False,
> > > > or
> > > > None ->
> > > > leave it to the duck array to decide
> > > > np.asarray(my_duck_array, copy=True)  # always copies, but, if
> > > > supported
> > > > by the duck array, defers to it for the copy
> > > > np.asarray(my_duck_array, copy=‘allow’)  # could take values
> > > > ‘allow’,
> > > > ‘force’, ’no’, True(=‘force’), False(=’no’)
> > > > np.asarray(my_duck_array, force_copy=False, allow_copy=True)  #
> > > > separate
> > > > concepts, but unclear what force_copy=True, allow_copy=False
> > > > means!
> > > > np.asarray(my_duck_array, force=True)
> > > > 
> > > > Juan.
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > 
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list