[Numpy-discussion] Baffling error: ndarray += csc_matrix -> "ValueError: setting an array element with a sequence"

Fri Sep 27 16:33:24 EDT 2013

On Fri, Sep 27, 2013 at 8:27 PM, Pauli Virtanen <pav at iki.fi> wrote:
> 27.09.2013 22:15, Nathaniel Smith kirjoitti:
> [clip]
>> 3) The issue of how to make an in-place like ndarray += sparse
>> continue to work in the brave new __numpy_ufunc__ world.
>>
>> For this last issue, I think we disagree. It seems to me that the
>> right answer is that csc_matrix.__numpy_ufunc__ needs to step up and
>> start supporting out=! If I have a large dense ndarray and I try to +=
>> a sparse array to it, this operation should take no temporary memory
>> and nnz time. Right now it sounds like it actually copies the large
>> dense ndarray, which takes time and space proportional to its size.
>> AFAICT the only way to avoid that is for scipy.sparse to implement
>> out=. It shouldn't be that hard...?
>
> Sure, scipy.sparse can easily support also the output argument.

Great! I guess solving this somehow will be release-critical, to avoid
a regression in this case when __numpy_ufunc__ gets released. If the
solution should be in scipy, I guess we should file the bug there?

> But I still maintain that the implementation of __iadd__ in Numpy is
> wrong.

Oh yeah totally.

> What it does now is:
>
>         def __iadd__(self, other):
>             return np.add(self, other, out=self)
>
> But since we know this raises a TypeError if the input is of a type that
> cannot be dealt with, it should be:
>
>         def __iadd__(self, other):
>             try:
>                 return np.add(self, other, out=self)
>             except TypeError:
>                 return NotImplemented
>
> Of course, it's written in C so it's a bit more painful to write this.
>
> I think this will have little performance impact, since the check would
> be only a NULL check in the inplace op methods + subsequent handling. I
> can take a look at some point at this...

I'm a little uncertain about the "swallow all TypeErrors" aspect of
this -- e.g. this could have really weird effects for object arrays,
where ufuncs may raise arbitrary user exceptions.

One possibility in the long run is to just say, if you want to
override ndarray __iadd__ or whatever, then you have to use
__numpy_ufunc__. Not really much point in having *two* sets of
implementations of the NotImplemented dance for the same operation.

-n