[SciPy-dev] In-place operators and casting

Wed Nov 23 21:39:29 EST 2005

Hi all,

I've been discussing the behaviour of matrix objects with Travis offline
after I made a rather ugly patch.  The problem I was trying to solve was
the one described by Jonathan Taylor in the [Default type behaviour of
array] thread, essentially that:

>>> c = zeros(10)
>>> c += rand(10)
>>> c
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

wasn't what he wanted, and he had to spend time figuring out what was
wrong.  My idea was to turn matrices into something more user-friendly
than arrays for users migrating from Matlab, R, etc. by redefining
matrices' in-place operators like += to have the same upcasting behaviour
as the regular operators like +.  Then this would be possible:

>>> b = matrix(zeros(10))
>>> b
matrix([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
>>> b += rand(10)
>>> b
matrix([[ 0.80751041,  0.61973329,  0.70726955,  0.94220288,  0.41340826,
         0.39087675,  0.81454443,  0.25357685,  0.06850165,  0.19652445]])

Travis says that he doesn't think it makes sense to in-place cast an array
(or matrix) to a different type, and that a floatzeros() function could be
sufficient to avoid the problem above.  But I think this only solves one
instance of a more general usability problem with casting and in-place
operators.

I see two requirements for an intuitive += operator (and other <op>=
friends) without any nasty surprises.  First, 'a += b' should give the
same result as 'a = a + b', just more efficiently if possible, and this
shouldn't eat up the data of unsuspecting users.  This isn't true at the
moment.  Second, 'a += b' shouldn't change 'a' to a different object.
This is true at the moment:

>>> a = ones(3)
>>> id(a)
140648904
>>> a += ones(3, complex)
>>> id(a)
140648904

Another interpretation of this second point is that the type of an array
shouldn't change once we've declared it.  I think this is what Travis is
reluctant to sacrifice for the sake of the first point.

If a cast from b.dtype to a.dtype can lose information (like in these
examples) I don't think it's possible for a += b to satisfy both these
requirements.  The meaning of "a += b" is ambiguous: does the user want a
safe or unsafe cast?  I propose instead that we raise an exception:

>>> a = zeros(5)
>>> a += rand(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: array cannot be safely cast to required type

We currently have similar examples of type-checking:

>>> array(rand(5), int)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: array cannot be safely cast to required type

>>> a[:] = rand(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: array cannot be safely cast to required type

This would allow Travis to remove another red warning from his book and
should save users some grief if they haven't read it (or know it and
still make the mistake).

In some cases the user will know that the operation will result in a
potentially unsafe cast and will want to proceed anyway.  For these cases
I'd argue that a more explicit notation is no bad thing.  Two options
are:

>>> a += cast[int](rand(5))
>>> a += rand(5).astype(int)

Another might be the 'FORCECAST' flag, but I'm not sure whether this is
still supported.

Comments?

-- Ed