[Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

Sebastian Berg sebastian at sipsolutions.net
Wed Apr 29 14:17:03 EDT 2020


On Wed, 2020-04-29 at 05:26 -0500, Juan Nunez-Iglesias wrote:
> Hi everyone, and thank you Ralf for carrying the flag in my absence.
> =D
> 
> Sebastian, the *primary* motivation behind avoiding detach() in
> PyTorch is listed in original post of the PyTorch issue:
> 
> > People not very familiar with `requires_grad` and cpu/gpu Tensors
> > might go back and forth with numpy. For example doing pytorch ->
> > numpy -> pytorch and backward on the last Tensor. This will
> > backward without issue but not all the way to the first part of the
> > code and won’t raise any error.
> 
> The PyTorch team are concerned that they will be overwhelmed with
> help requests if np.array() silently succeeds on a tensor with
> gradients. I definitely get that.

Sorry for playing advocatus diaboli...

I guess it is simply that before the end, it would be nice to have a
short list with projects:

* Napari, matplotlib on the "user" side
* PyTorch, ...? on the "provider" side

And maybe what their expectations on `force=True` are, to make sure
they roughly align.

The best definition for when to use `force=True` at this time seems to
be "end-point" users (such as visualization or maybe writing to disk?).

I still think performance can be just as valid of an issue there. For
example it may be better to convert to a numpy array earlier in the
computation.  Or someone could be surprised that saving their gpu array
to an hdf5 file is by far the slowest part of the computation.

Maybe I have the feeling the definition we want is actually:

   There is definitely no way to do this computation faster or better
   than by converting it to a NumPy array.

Since currently the main reason to reject it seems a bit to be:

   Wait, are you sure there is not a much better way than using NumPy
   arrays, be careful!

And while that distinction is clear for PyTorch + visualization, I am
not quite sure yet, that it is clear for various combinations of
`force=True` and array-like users.
Maybe CuPy does not want h5py to use `force=True`, because cupy has its
own super fast "stream-to-file" functionality... But it wants to to do
it for napari.

- Sebastian


> 
> Avoiding .gpu() is more straightforwardly about avoiding implicit
> expensive computation.
> 
> > while others do not choose to teach about it. There seems very
> > little
> > or even no "promise" attached to either `force=True` or
> > `force=False`.
> 
> NumPy can set a precedent through policy. The *only* reason client
> libraries would implement `__array__` is to play well with NumPy, so
> if NumPy documents that `force=True` should *always* succeed, we can
> expect client libraries to follow suit. At least the PyTorch devs
> have indicated that they would be open to this.
> 
> > E.g. Napari wants to use it, but do the array-providers want Napari
> > to use it?
> 
> As Ralf pointed out, the PyTorch devs have already agreed to it.
> 
> From the napari perspective, we'd be ok with leaving the decision on
> warnings to client libraries. We may or may not suppress them
> depending on user requests. ;) But the point is to have a way of
> saying "give me a NumPy array DAMMIT" without having to know about
> all the possible array libraries. Which are numerous and getting
> numerouser.
> 
> Ralf, you said you don't want warnings — even for sparse arrays? That
> was an area of concern for you on the PyTorch discussion side.
> 
> > And if the conversion still gives warnings for some array-objects,
> > have we actually gained much?
> 
> Yes.
> 
> Hameer,
> 
> > I would advocate for a `force=` kwarg but personally I don't think
> > it's explicit enough, but probably as explicit as can be given
> > NumPy's API.
> 
> Yeah, I agree that force is kind of vague, which is why I was looking
> for things like `allow_copy`. But it is hard to be general enough
> here: sparse requires an expensive instantiation, cupy requires
> copying from gpu to cpu, dask requires arbitrary computation, xarray
> requires information loss... I'm inclined to agree with Ralf that
> force= is the only generic-enough term, but I'm happy to entertain
> other options!
> 
> Juan.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list