[SciPy-dev] In-place operators and casting

Thu Nov 24 06:03:39 EST 2005

Ed Schofield wrote:
> Hi all,
> 
> I've been discussing the behaviour of matrix objects with Travis offline
> after I made a rather ugly patch.  The problem I was trying to solve was
> the one described by Jonathan Taylor in the [Default type behaviour of
> array] thread, essentially that:
> 
> 
> 
>>>>c = zeros(10)
>>>>c += rand(10)
>>>>c
> 
> array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
> 
> wasn't what he wanted, and he had to spend time figuring out what was
> wrong.  My idea was to turn matrices into something more user-friendly
> than arrays for users migrating from Matlab, R, etc. by redefining
> matrices' in-place operators like += to have the same upcasting behaviour
> as the regular operators like +.  Then this would be possible:
> 
> 
> 
>>>>b = matrix(zeros(10))
>>>>b
> 
> matrix([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
> 
>>>>b += rand(10)
>>>>b
> 
> matrix([[ 0.80751041,  0.61973329,  0.70726955,  0.94220288,  0.41340826,
>          0.39087675,  0.81454443,  0.25357685,  0.06850165,  0.19652445]])
> 
> 
> Travis says that he doesn't think it makes sense to in-place cast an array
> (or matrix) to a different type, and that a floatzeros() function could be
> sufficient to avoid the problem above.  But I think this only solves one
> instance of a more general usability problem with casting and in-place
> operators.
> 
> I see two requirements for an intuitive += operator (and other <op>=
> friends) without any nasty surprises.  First, 'a += b' should give the
> same result as 'a = a + b', just more efficiently if possible, and this
> shouldn't eat up the data of unsuspecting users.  This isn't true at the
> moment.  Second, 'a += b' shouldn't change 'a' to a different object.
> This is true at the moment:
> 
> 
>>>>a = ones(3)
>>>>id(a)
> 
> 140648904
> 
>>>>a += ones(3, complex)
>>>>id(a)
> 
> 140648904
> 
> Another interpretation of this second point is that the type of an array
> shouldn't change once we've declared it.  I think this is what Travis is
> reluctant to sacrifice for the sake of the first point.
> 
> If a cast from b.dtype to a.dtype can lose information (like in these
> examples) I don't think it's possible for a += b to satisfy both these
> requirements.  The meaning of "a += b" is ambiguous: does the user want a
> safe or unsafe cast?  I propose instead that we raise an exception:
> 
> 
>>>>a = zeros(5)
>>>>a += rand(5)
> 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: array cannot be safely cast to required type
> 
> We currently have similar examples of type-checking:
> 
> 
>>>>array(rand(5), int)
> 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: array cannot be safely cast to required type
> 
> 
>>>>a[:] = rand(5)
> 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: array cannot be safely cast to required type
> 
> This would allow Travis to remove another red warning from his book and
> should save users some grief if they haven't read it (or know it and
> still make the mistake).
> 
> In some cases the user will know that the operation will result in a
> potentially unsafe cast and will want to proceed anyway.  For these cases
> I'd argue that a more explicit notation is no bad thing.  Two options
> are:
> 
> 
>>>>a += cast[int](rand(5))
>>>>a += rand(5).astype(int)
> 
> 
> Another might be the 'FORCECAST' flag, but I'm not sure whether this is
> still supported.
> 
> Comments?

I am not friend of silent upcasting, since often (my :-)) extension 
modules require a certain data type. I would prefer throwing the 
exception with the possibility of overriding via .astype().

r.