[PYTHON MATRIX-SIG] A problem with slicing

Jim Fulton, U.S. Geological Survey jfulton@usgs.gov
Thu, 14 Sep 1995 10:15:42 -0400


On Thu, 14 Sep 1995 09:20:28 -0400 
Guido van Rossum said:
> [The third in a series of short essays on subjects raised in the
> Matrix discussion.]
> 
> Here's a problem where I have neither a strong opinion nor a perfect
> solution...
> 
> Jim Fulton proposes an elegant indexing syntax for matrix objects
> which doesn't require any changes to the language:
> 
> 	M[i][j]
> 
> references the element at column i and row j (or was that column j and
> row i?  Never mind...).

Actually, it's element j of sub-matrix i.  If M is a 2-d matrix, then
you may choose to call submatrices either "rows" or "columns".  I
prefer "columns".
 
> This nicely generalizes to slicing, so you can write:
> 
> 	M[i][j1:j2]
> 
> meaning the column vector at column i with row indices j1...j2-1.
> 
> Unfortunately, the analogous expression for a row vector won't work:
> 
> 	M[i1:i2][j]
> 
> The reason for this is that it works by interpreting M as a sequence
> of columns (and it's all evaluated one thing at a time -- M[i][j]
> means (M[i])[j], and so on).  M[i] is column i, so M[i][j] is the
> element at row j thereof.  But slice semantics imply that of M is a
> sequence of X'es, then M[i1:j1] is still a sequence of X'es -- just
> shorter.  So M[p:q][r] is really the same as M[p+r] (assuming r<q-p).
> 
> 
> One way out of this is to adopt the syntax
> 
> 	M[i, j]
> 
> for simple indexing.  This would require only a minor tweaking of the
> grammar I believe. 

In fact, this could be as simple as saying that the comma operator
generates tuples inside of []s.  This is:

  M[i,j] is equivalent to M[(i,j)].

or even:

  M[i,] is equivalent to M[(i,)]

> This could be extended to support
> 
> 	M[i1:i2, j]
> 	M[i1:i2, j1:j2]
> 	M[i, j1:j2]
> 
> (and of course higher-dimensional equivalents).
> 
> This would require considerable changes of the run-time architecture
> of slicing and indexing, and since currently everything is geared
> towards one-dimensional indexing/slicing, but I suppose it would be
> doable.

I agree.
 
> (Funny how I'm accepting this possibility of changing the language
> here, while I'm violently opposed to it for operator definitions.  I

Yeah.  Strange even! ;-)

> guess with adding operators there is no end to the number of new
> operators you could dream up, so there would be no end to the change;
> while here there's a clear-cut one-time change.)

Hm.

I really don't think this is a good idea.  I don't really think we
need M[i1:i2, j1:j2]. M[range(i1,i2),range(j1,j2)] is fine for me.

Plus, it also allows: M[(1,3,5),(2,4,6)], in other words, we can
simply allow a sequence of indexes for a dimension and then let range
generate the desired sequence when we want a range.

> 
> Of course adopting such a change would completely ruin any possbility
> of using things like
> 
> 	M[3, 4, 7] = [1, 10, 100]
> 
> as roughly equivalent to
> 
> 	M[3] = 1
> 	M[4] = 10
> 	M[7] = 100
> 
> but then again I'm not too fond of that anyway (as a matter of fact,
> I'd oppose it strongly).
> 
> 
> Some other things that I haven't completely followed through, and that
> may cause complications for the theoretical foundation of it all:
> 
> - Allowing M[i, j] for (multidimensional) sequence types would also
> meaning that D[i, j] would be equivalent to D[(i, j)] for
> dictionaries.

I see no reason to support M[i,j] for arbitrary sequence types.  I'd
say that if a type wants to support multiple arguments to [], then it
should provide mapping behavior and have the mapping implementation
sniff for either an integer or a tuple argument and do the right
thing.  

I am *vary much* against a language change to support this.

> - Should M[i][j] still be equivalent to M[i, j]?

Yes. M[i,j] is really a compact form of M[((i),(j))].
 
> - Now we have multidimensional sequence types, should be have a
> multidimensional equivalent of len()?  Some ideas:

I'm against multi-dimension sequence types. 8->
 
>   - len(a, i) would return a's length in dimension i; len(a, i) == len(a)
> 
>   - dim(a) (or rank(a)?) would return the number of dimensions
> 
>   - shape(a) would return a tuple giving a's dimensions, e.g. for a
>   3x4 matrix it would return (3, 4), and for a one-dimensional
>   sequence such as a string or list, it would return a singleton
>   tuple: (len(a),).

Unnecessary.  Matrices can provide special methods for this.

> - How about multidimensional map(), filter(), reduce()?
> 
>   - map() seems easy (except there seems to be no easy way to specify
>   the rank of the operator): it returns a similarly shaped
>   multidimensional object whose elements are the function results for
>   the corresponding elements of the input matrix
> 
>   - filter() is problematic since the columns won't be of the same
>   length
> 
>   - reduce()??? -- someone who knows APL tell me what it should mean

There have been a number of proposals for generic functions that
operate over matrices in some fashion.  I have not had time to digest
them yet. Stay tuned. :-) (Geez, I really need to get back to my day
job.) 
 
> - Multidimensional for loops?  Or should for iterate over the first
> dimension only?

What is wrong with nested for loops.

Ee-gads, what's gotten into you? :-]
 
> One sees there are many potential consequences of a seemingly simple
> change --

Simple? 

I agree that enabling the tuplefication opertor, ",", in []s is
simple, but not adding mult-dimensional behavior to sequences.

> that's why I insist that language changes be thought through
> in extreme detail before being introduced...

I really don't see any reason why the matrix type should require
language changes (aside from the minor impact of the tuplefication
operator).

Jim

=================
MATRIX-SIG  - SIG on Matrix Math for Python

send messages to: matrix-sig@python.org
administrivia to: matrix-sig-request@python.org
=================