[Numpy-discussion] untenable matrix behavior in SVN

Gael Varoquaux gael.varoquaux at normalesup.org
Sat Apr 26 11:23:32 EDT 2008


On Sat, Apr 26, 2008 at 11:13:12AM -0400, Alan G Isaac wrote:
> On Sat, 26 Apr 2008, Gael Varoquaux apparently wrote:
> > I claim b is more important than a. IMHO, a is plain 
> > wrong: you should't be indexing x with x[0][0]. 


> Why??

Because a 2D object is not a list of list. It is more than that. The
numpy array actually exposes this interface, but I think using it
systematicaly when it is not required is abuse. Moreover you are creating
temporary objects that are useless and harmful. They are harmful for
performance: you have to create useless objects and call their
__getitem__ functions. 

> Would you say this about a 2d array?

Of course. Even more. Suppose you have an nd array, with n=10, to index
object using 10 different indexing, ie A[a][b]... you have to create 9
intermediate objects that get indexed. This is ridiculous and potentially
very harmful for performance.

> The core argument has been that it is a **basic 
> expectation** of the behavior of 2d array-like objects that 
> you will be able to get, e.g., the first element with x[0][0].
> (E.g., lists, tuples, arrays ... is there an exception??

For me this is wrong. list and tuples are not 2D. Numpy arrays happen to
offer this feature, but you should not use it do to multiple dimension
indexing.


> I teach with matrices and I thought you might too: if so, 
> you surely have run into this expectation (which is **natural**).

Well, I don't find it that natural. I would naturally expect A[0][0] to
be invalid. I have not heavily run into that expectation. People around
me are used to indexing with several indexes. This probably comes from
their fortran/Mathematica/Maple/Matlab background. I can see that someone
coming from C would expect this multiple consecutive indexing. This is
because C has no notion of multiple dimension objects. The C model is
limited and not as powerful as modern array programming languages, let us
just move along.

> In fact I truly doubt anyone has not been puzzled on first 
> encounter by this::

>     >>> x
>     matrix([[1, 2],
>             [3, 4]])
>     >>> x[0]
>     matrix([[1, 2]])
>     >>> x[0][0]
>     matrix([[1, 2]])

Well, actually, I can say that this behavior is surprising. When I teach
numpy and somebody encounters this behavior (which doesn't happen often,
because people use multiple dimension indexing), I have to point out that
x[0] is a 2D object also.

I sympathize with your crusade to fix this inconvenience. This is why I
support the RowVector/ColumnVector proposal. However your proposal breaks
what I consider as normal and important for something I consider should
be avoided.

Gaël



More information about the NumPy-Discussion mailing list