[Numpy-discussion] Shouldn't all in-place operations simply return self?

Thu Jan 17 11:33:47 EST 2013

On Thu, Jan 17, 2013 at 2:32 PM, Alan G Isaac <alan.isaac at gmail.com> wrote:
> Is it really better to have `permute` and `permuted`
> than to add a keyword?  (Note that these are actually
> still ambiguous, except by convention.)

The convention in question, though, is that of English grammar. In
practice everyone who uses numpy is a more-or-less skilled English
speaker in any case, so re-using the conventions is helpful!

"Shake the martini!" <- an imperative command

This is a complete statement all by itself. You can't say "Hand me the
shake the martini". In procedural languages like Python, there's a
strong distinction between statements (whole lines, a = 1), which only
matter because of their side-effects, and expressions (a + b) which
have a value and can be embedded into a larger statement or expression
((a + b) + c). "Shake the martini" is clearly a statement, not an
expression, and therefore clearly has a side-effect.

"shaken martini" <- a noun phrase

Grammatically, this is like plain "martini", you can use it anywhere
you can use a noun. "Hand me the martini", "Hand me the shaken
martini". In programming terms, it's an expression, not a statement.
And side-effecting expressions are poor style, because when you read
procedural code, you know each statement contains at least 1
side-effect, and it's much easier to figure out what's going on if
each statement contains *exactly* one side-effect, and it's the
top-most operation.

This underlying readability guideline is actually baked much more
deeply into Python than the sort/sorted distinction -- this is why in
Python, 'a = 1' is *not* an expression, but a statement. C allows you
to say things like "b = (a = 1)", but in Python you have to say "a =
1; b = a".

> Btw, two separate issues seem to be running side by side.
>
> i. should in-place operations return their result?
> ii. how can we signal that an operation is inplace?
>
> I expect NumPy to do inplace operations when feasible,
> so maybe they could take an `out` keyword with a None default.
> Possibly recognize `out=True` as asking for the original array
> object to be returned (mutated); `out='copy'` as asking for a copy to
> be created, operated upon, and returned; and `out=a` to ask
> for array `a` to be used for the output (without changing
> the original object, and with a return value of None).

Good point that numpy also has a nice convention with out= arguments
for ufuncs. I guess that convention is, by default return a new array,
but also allow one to modify the same (or another!) array in-place, by
passing out=. So this would suggest that we'd have
  b = shuffled(a)
  shuffled(a, out=a)
  shuffled(a, out=b)
  shuffle(a) # same as shuffled(a, out=a)
and if people are bothered by having both 'shuffled' and 'shuffle',
then we drop 'shuffle'. (And the decision about whether to include the
imperative form can be made on a case-by-case basis; having both
shuffled and shuffle seems fine to me, but probably there are other
cases where this is less clear.)

There is also an argument that if out= is given, then we should always
return None, in general. I'm having a lot of trouble thinking of any
situation where it would be acceptable style (or even useful) to write
something like:
  c = np.add(a, b, out=a) + 1
But, 'out=' is very large and visible (which makes the readability
less terrible than it could be). And np.add always returns the out
array when working out-of-place (so there's at least a weak
countervailing convention). So I feel much more strongly that
shuffle() should return None, than I do that np.add(out=...) should
return None.

A compromise position would be to make all new functions that take
out= return None when out= is given, while leaving existing ufuncs and
such as they are for now.

-n