[Numpy-discussion] .T Transpose shortcut for arrays again

Bill Baxter wbaxter at gmail.com
Thu Jul 6 21:56:43 EDT 2006


Tim Wrote:

> That second argument is particularly uncompelling, but I think I agree
> that in a vacuum swapaxes(-2,-1) would be a better choice for .T than
> reversing the axes. However, we're not in a vacuum and there are several
> reasons not to do this.
>    1. A.T and A.transpose() should really have the same behavior.


There may be a certain economy to that but I don't see why it should
necessarily be so.  Especially if it's agreed that the behavior .transpose()
is not very useful.  The use case for .T is primarily to make linear algebra
stuff easier.  If you're doing n-dim stuff and need something specific,
you'll use the more general .transpose().

   2. Changing A.transpose would be one more backwards compatibility issue.


Maybe it's a change worth making though, if we are right in saying that the
current .transpose() for ndim>2 is hardly ever what you want.

   3. Since, as far as I can tell there's not concise way of spelling
> A.swapaxes(-2,-1) in terms of A.transpose it would make documenting and
> explaining the default case harder.
>

Huh?  A.swapaxes (-2,-1) is pretty concise.  Why should it have to have an
explanation in terms of A.transpose?  Here's the explanation for the
documentation: "A.T returns A with the last two axes transposed.  It is
equivalent to A.swapaxes (-2,-1).  For a 2-d array, this is the usual matrix
transpose."  This just is a non-issue.


Sasha wrote:

> > more common to want to swap just two axes, and the last two seem a
> logical
> > choice since a) in the default C-ordering they're the closest together
> in
> > memory and b) they're the axes that are printed contiguously when you
> say
> > "print A".
>
> It all depends on how you want to interpret a rank-K tensor.  You seem
> to advocate a view that it is a (K-2)-rank array of matrices and .T is
> an element-wise transpose operation. Alternatively I can expect that
> it is a matrix of (K-2)-rank arrays and then .T should be
> swapaxes(0,1).  Do you have real-life applications of swapaxes(-2,-1)
> for rank > 2?
>

Yep, like Tim said.  The usage is say a N sets of basis vectors.  Each set
of basis vectors is a matrix.  And say I have a different basis associated
with each of N points in space.  Usually I'll want to print it out organized
by basis vector set.  I.e. look at the matrix associated with each of the
points.  So it makes sense to organize it as shape=(N,a,b) so that if I
print it I get something that's easy to interpret.  If I set it up as
shape=(a,b,N) then what's easiest to see in the print output is all N first
basis vectors, all N second basis vectors, etc.  Also again in a C memory
layout, the last two axes are closest in memory, so it's more cache friendly
to have the bits that will usually be used together in computations be on
the trailing end.  In matlab (which is fortran order), I do things the other
way, with the N at the end of the shape.  (And note that Matlab prints out
the first two axes contiguously.)

Either way swapaxes(-2,-1) is likely more likely to be what you want than
.transpose().


> > and swapaxes(-2,-1) is
> > > invalid for rank < 2.
> > >
> >  At least in numpy 0.9.8, it's not invalid, it just doesn't do anything.
>
> >
>
> That's bad.  What sense does it make to swap non-existing axes? Many
> people would expect transpose of a vector to be a matrix.  This is the
> case in S+ and R.
>

Well, I would be really happy for .T to return an (N,1) column vector if
handed an (N,) 1-d array.  But I'm pretty sure that would raise more furuor
among the readers of the list than leaving it 1-d.

> > My main objection is that a.T is fairly cryptic
> > > - is there any other language that uses attribute for transpose?
> >
> >
> > Does it matter what other languages do?  It's not _that_ cryptic.
>

If something is clear and natural, chances are it was done before.


The thing is most other numerical computing languages were designed for
doing numerical computing.  They weren't designed originally for writing
general purpose software, like Python was.  So in matlab, for instance,
transpose is a simple single-quote.  But that doesn't help us decide what it
should be in numpy.

For me prior art is always a useful guide when making a design choice.
>  For example, in R, the transpose operation is t(a) and works on rank
> <= 2 only always returning rank-2.


I have serious reservations about a function called t().  x,y,z, and t are
probably all in the top 10 variable names in scientific computing.

K (an APL-like language) overloads
> unary '+' to do swapaxes(0,1) for rank>=2 and nothing for lower rank.


Hmm.  That's kind of interesting, it seems like an abuse of notation to me.
And precedence might be an issue too.  The precedence of unary + isn't as
high as attribute access.  Anyway, as far as the meaning of + in K, I'm
guessing K's arrays are in Fortran order, so (0,1) axes vary the fastest.  I
couldn't find any documentation for the K language from a quick search,
though.

Both R and K solutions are implementable in Python with R using 3
> characters and K using 1(!) compared to your two-character ".T"
> notation.  I would suggest that when inventing something new, you
> should consider prior art and explain how you invention is better.
> That's why what other languages do matter. (After all, isn't 'T'
> chosen because "transpose" starts with "t" in the English language?)


Yes you're right.  My main thought was just what I said above, that there
probably aren't too many other examples that can really apply in this case,
both because most numerical computing languages are custom-designed for
numerical computing, and also because Python's attributes are also kind of
uncommon among programming languages.  So it's worth looking at other
examples, but in the end it has to be something that makes sense for a
numerical computing package written in Python, and there aren't too many
examples of that around.

>  You could write a * b.transpose(1,0)
> > right now and still not know whether it was matrix or element-wise
> > multiplication.
>
> Why would anyone do that if b was a matrix?
>

Maybe because, like you, they think "that a.T is fairly cryptic".

> But probably a better solution
> > would be to have matrix versions of these in the library as an optional
> > module to import so people could, say, import them as M and use M.ones
> (2,2).
> >
>

This is the solution used by ma, which is another argument for it.
>

Yeh, I'm starting to think that's better than slapping an M attribute on
arrays, too.  Is it hard to write a module like that?

I only raised a mild objection against .T, but the slippery slope
>
argument makes me dislike it much more.  At the very least I would
>
like to see a discussion of why a.T is better than t(a) or +a.
>

*  A.T puts the T on the proper side of A, so in that sense it looks more
like the standard math notation.
*  A.T has precedence that roughly matches the standard math notation
*  t(A) uses an impossibly short function name that's likely to conflict
with local variable names.  To avoid the conflict people will just end up
using it as numpy.t(A), at which point it's value as a shortcut for
transpose is nullified.  Or they'll have to do a mini-import within specific
functions ("from numpy import t") to localize the namespace pollution.  But
at that point they might as well just say  " t= numpy.transpose".
* t(A) puts the transpose operator on the wrong side of A
* +A puts the transpose operator on the wrong side of A also.
* +A implies addition.  The general rule with operator overloading is that
the overload should have the same general meaning as the original operator.
So overloading * for matrix multiplication makes sense.  Overloading & for
would be a bad idea.  New users looking at something like A + +B are pretty
certain to be confused because they think they know what + means, but
they're wrong.  If you see A + B.T, you either know what it means or you
know immediately that you don't know what it means and you go look it up.
* +A has different precedence than the usual transpose operator.  (But I
can't think of a case where that would make a difference now.)


Tim Hochberg wrote:

> > Well, you could overload __rpow__ for a singleton T and spell it A**T
> > ... (I hope no one will take that proposal seriosely).   Visually, A.T
>
> looks more like a subscript rather than superscript.
> >
>
No, no no. Overload __rxor__, then you can spell it A^t, A^h, etc. Much
>
better ;-).  [Sadly, I almost like that....]
>

Ouch!  No way!  It's got even worse precedence problems than the +A
proposal.  How about A+B^t ? And you still have to introduce 'h' and 't'
into the global namespace for it to work.

Here's a half baked thought: if the objection to t(A) is that it doesn't
> mirror the formulae where t appears as a subscript after A. Conceivably,
> __call__ could be defined so that A(x) returns x(A). That's kind of
> perverse, but it means that A(t), A(h), etc. could all work
> appropriately for suitably defined singletons. These singletons could
> either be assembeled in some abbreviations namespace or brought in by
> the programmer using "import transpose as t", etc. The latter works for
> doing t(a) as well of course.


Same problem with the need for global t.  And it is kind of perverse,
besides.


Robert Kern wrote:

> Like Sasha, I'm mildly opposed to .T (as a synonym for .transpose()) and
> much
> more opposed to the rest (including .T being a synonym for .swapaxes(-2,
> -1)).
> It's not often that a proposal carries with it its own slippery-slope
> argument
> against itself.
>

The slippery slope argument only applies to the .M, not the .T or .H.  And I
think if there's a matrixutils module with redefinitions of ones and zeros
etc, and if other functions are all truly fixed to preserve matrix when
matrix is passed in, then I agree, there's not so much need for .M.

I don't think that just because arrays are often used for linear algebra
> that
>
linear algebra assumptions should be built in to the core array type.


It's not just that "arrays can be used for linear algebra".  It's that
linear algebra is the single most popular kind of numerical computing in the
world!  It's the foundation for a countless many fields.   What you're
saying is like "grocery stores shouldn't devote so much shelf space to food,
because food is just one of the products people buy", or "this mailing list
shouldn't be conducted in English, because English is just one of the
languages people can speak here", or "I don't think my keyboard should
devote so much space to the A-Z keys, because there are so many characters
in the Unicode character set that could be there instead", or to quote from
a particular comedy troop:

"Ah, how about Cheddar?"
"Well, we don't get much call for it around here, sir."
"Not much ca- It's the single most popular cheese in the world!"
"Not round here, sir."

Linear algebra is pretty much the 'cheddar' of the numerical computing
world.  But it's more than that.  It's like the yeast of the beer world.
Pretty much everything starts with it as a base.  It makes sense to make it
as convenient as possible to do with numpy, even if it is a "special case".
I wish I could think of some sort of statistics or google search I could
cite to back this claim up, but as far as my academic background from high
school though Ph.D. goes, linear algebra is a mighty big deal, not merely an
"also ran" in the world of math or numerical computing.

Sasha Wrote:

> In addition, transpose is a (rank-2) array or matrix operation and not
> a linear algebra operation.  Transpose corresponds to the "adjoint"
> linear algebra operation if you represent vectors as single column
> matrices and co-vectors as single-row matrices.  This is a convenient
> representation followed by much of the relevant literature, but it
> does not alow generalization beyond rank-2.
>

I would be willing to accept a .T that just threw an exception if ndim were
> 2.  That's what Matlab does with its transpose operator.  I don't like
that behavior myself -- it seems wasteful when it could just have some well
defined behavior that would let it be useful at least some of the time on
N-d arrays.

I don't like it either, but I don't like .T even more.  These days I
> hate functionality I cannot google for.  Call me selfish, but I
> already know what unary '+' can do to a higher rank array, but with .T
> I will always have to look up which axes it swaps ...


I think '.T'  is more likely to be searchable than '+'.   And when you say
you already know what unary + can do, you mean because you've used K?
That's not much use to the typical user, who also thinks they know what a
unary + does, but they'd be wrong in this case.


So, in summary, I vote for:
- Keep the .T and the .H on array
- Get rid of .M
- Instead implement a matrix helper module that could be imported as M,
allowing M.ones(...) etc.

And also:
- Be diligent about fixing any errors from matrix users along the lines of "
numpy.foo returns an array when given a matrix" (Travis has been good about
this -- but we need to keep it up.)  Part of the motivation for .M attribute
was just as a band-aid on the problem of matrices getting turned into
arrays.  Having .M means you can just slap a .M on the end of any result you
aren't sure about.  It's better (but harder) to fix the upstream problem of
functions not preserving subtypes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20060707/1c60978d/attachment.html>


More information about the NumPy-Discussion mailing list