[Numpy-discussion] A case for rank-0 arrays

Thu Feb 23 21:34:01 EST 2006

Sasha wrote:

>The main criticism of supporting both scalars and rank-0 arrays is
>that it is "unpythonic" in the sense that it provides two almost
>equivalent ways to achieve the same result.  However, I am now
>convinced that this is the case where practicality beats purity.
>  
>
I think most of us agree that both will be with us for the indefinite 
future.

>The situation with ndarrays is somewhat similar. A rank-N array is
>very similar to a function with N arguments, where each argument has a
>finite domain (i-th domain of a is range(a.shape[i])).  A rank-0 array
>is just a function with no arguments and as such it is quite different
>from a scalar.  
>
I can buy this view.  Nicely done.

>Just as a function with no arguments cannot be
>replaced by a constant in the case when a value returned may change
>during the run of the program, rank-0 array cannot be replaced by an
>array scalar because it is mutable.  (See
>http://projects.scipy.org/scipy/numpy/wiki/ZeroRankArray for use
>cases).
>
>Rather than trying to hide rank-0 arrays from the end-user and treat
>it as an implementation artifact, I believe numpy should emphasize the
>difference between rank-0 arrays and scalars and have clear rules on
>when to use what.
>  
>
I agree.  The problem is what should the rules be.  Right now, there are 
no clear rules other than rank-0 arrays --- DONT.

You make a case that we should not be so hard on rank-0 arrays.

>PROPOSALS
>==========
>
>Here are three suggestions:
>
>1. Probably the most controversial question is what getitem should
>return. I believe that most of the confusion comes from the fact that
>the same syntax implements two different operations: indexing and
>projection (for the lack of better name).  Using the analogy between
>ndarrays and functions, indexing is just the application of the
>function to its arguments and projection is the function projection
>((f, x) -> lambda (*args): f(x, *args)).
>
>The problem is that the same syntax results in different operations
>depending on the rank of the array.
>
>Let
>  
>
>>>>x = ones((2,2))
>>>>y = ones(2)
>>>>        
>>>>
>
>then x[1] is projection and type(x[1]) is ndarray, but y[1] is
>indexing and type(y[1]) is int32.  Similarly, y[1,...] is indexing,
>while x[1,...] is projection.
>
>I propose to change numpy rules so that if ellipsis is present inside
>[], the operation is always projection and both y[1,...] and
>x[1,1,...] return zero-rank arrays.  Note that I have previously
>rejected Francesc's idea that x[...] and x[()] should have different
>meaning for zero-rank arrays.  I was wrong.
>  
>
I think this is a good and clear rule.  And it seems like we may be 
"almost" there.
Anybody want to implement it?

>2. Another source of ambiguity is the various "reduce" operations such
>as sum or max.  Using the previous example, type(x.sum(axis=0)) is
>ndarray, but type(y.sum(axis=0)) is int32.  I propose two changes:
>
>   a. Make x.sum(axis)  return ndarray unless axis is None, making
>type(y.sum(axis=0)) is ndarray true in the example.
>
>  
>
Hmm... I'm not sure.  y.sum(axis=0) is the default spelling of sum(y).  
Thus, this would cause all old code to return a rank-0 array.

Most people who write sum(y) want a scalar, not a "function with 0 
arguments"

>   b. Allow axis to be a sequence of ints and make
>x.sum(axis=range(rank(x))) return rank-0 array according to the rule
>2.a above.
>  
>
So, this would sum over multiple axes?  I guess I'm not opposed to 
something like that, but I'm not really excited about it either.   Would 
that make sense for all methods that take the axis= argument?

>   c. Make x.sum() raise an error for rank-0 arrays and scalars, but
>allow x.sum(axis=()) to return x.  This will make numpy sum consistent
>with the built-in sum that does not work on scalars.
>
>  
>
I don't think I like this at all. 

This proposal has more far-reaching implications (and would require more 
code changes --- though the axis= arguments do have a converter function 
and so would not be as painful as one might imagine).

In short, I don't feel as enthused about portion 2 of your proposal.

>3. This is a really small change currently
>  
>
>>>>empty(())
>>>>        
>>>>
>array(0)
>
>but
>
>  
>
>
>I propose to make shape=() valid in ndarray constructor.
>  
>
+1

I think we need more thinking about rank-0 arrays before doing something 
like proposal 2.  However, 1 and 3 seem simple enough to move forward 
with...

-Travis