[Numpy-discussion] r_, c_, hstack, and vstack with 1-d arrays

Thu Jul 20 21:51:07 EDT 2006

I looked into the various concatenation methods a bit more to better
understand what's going on under the hood.

Here's essentially what these different methods do:

vstack(tup):
    concatenate( map(atleast_2d,tup), axis=0 )

hstack(tup):
    concatenate( map(atleast_1d,tup),axis=1 )

column_stack(tup):
    arrays = map( transpose,map(atleast_2d,tup) )
    concatenate(arrays,1)

(note that column_stack transposes *everything* not just 1-d inputs,
so it doesn't do quite what I thought it did, i.e. only transposing
1-d inputs)

The above 3 are pretty much exactly the code used by numpy.  That's
all there is to those 3 functions.
For r_ and c_ I'm summarizing, but effectively they seem to be doing
something like:

r_[args]:
    concatenate( map(atleast_1d,args),axis=0 )

c_[args]:
    concatenate( map(atleast_1d,args),axis=1 )

c_ behaves almost exactly like hstack -- with the addition of range
literals being allowed.

r_ is most like vstack, but a little different since it effectively
uses atleast_1d, instead of atleast_2d.  So you have
>>> numpy.vstack((1,2,3,4))
array([[1],
       [2],
       [3],
       [4]])
but
>>> numpy.r_[1,2,3,4]
array([1, 2, 3, 4])

However for cases like that with just 0-d or 1-d inputs, c_ behaves
identically to r_, so if you wanted to get a 1-d output you could have
just used c_.

So I take back what I said about wishing c_ were like column_stack.
Column stack is weird.
Instead, I think the right thing to do would be to make r_ behave more
like vstack.  I think that would make things more consistent, and make
for less for the user to remember.

After making that change, to make things even more consistent, it
might make sense to rename r_ and c_  to v_ and h_ instead.  Then it's
easy to remember  'v_' is like 'vstack',  'h_' is like hstack.

Furthermore, I propose that column_stack should only transpose its 1d
inputs.  "Stack colums" defnitely doesn't imply to me that something
that already has columns will be transposed.  Currently it is
documented to only work on 1d inputs, so hopefully that's a change
that wouldn't affect too many people.  The function in
numpy/lib/shape_base.py could be replaced with this:

def column_stack(tup):
    def transpose_1d(array):
         if array.ndim<2: return _nx.transpose(atleast_2d(array))
         else: return array
    arrays = map(transpose_1d,map(atleast_1d,tup))
    return _nx.concatenate(arrays,1)

If r_, and c_ get renamed to v_, h_, then c_ could be re-introduced
with behavior similar to column_stack.

Finally, I noticed that the atleast_nd methods return arrays
regardless of input type.  At a minimum, atleast_1d and atleast_2d on
matrices should return matrices.  I'm not sure about atleast_3d, since
matrices can't be 3d.  (But my opinon is that the matrix type should
be allowed to be 3d).  Anyway, since these methods are used by the
*stack methods, those also do not currently preserve the matrix type
(in SVN numpy).

SUMMARY:
* make r_ behave like "vstack plus range literals"
* make column_stack only transpose its 1d inputs.
* rename r_,c_ to v_,h_ (or something else) to make their connection
with vstack and hstack clearer.  Maybe vs_ and hs_ would be better?
* make a new vertsion of 'c_' that acts like column_stack so that
theres a nice parallel v_<=>vstack,  h_<=>hstack, c_<=>column_stack
* make atleast_*d methods preserve the input type whenever possible

Thoughts?
--bb