[SciPy-User] accessing a set of columns from a recarray

Sun Mar 21 22:33:06 EDT 2010

On Sun, Mar 21, 2010 at 10:20 PM, Vincent Davis
<vincent at vincentdavis.net> wrote:
>
> To many distractions let me try to write that a little better.
> I have a record array and a list of columns for which I would like to get the row means. My current solution is to iterate though the list of column names and make a new "normal" array. then calculate the row means. I would like to do something like np.mean(A['x','y','z']) where x,y,z are the tiles of the columns
>

If you have a rec array

Y = np.rec.array([(1.0, 2.0, 3.0), (4.0, 5.0, 6.0), (7.0, 8.0, 9.0)],
      dtype=[('var1', '<f8'), ('var2', '<f8'), ('var3', '<f8')])

You can access the rows like,

Y[['var1','var2','var3']]

Note the list within [].

If you want a "normal" array, I like this way that Pierre recently
pointed out.  3 is the number of columns, and it fills in the number
of rows.

Y[['var1','var2','var3']].view((float,3))

note the tuple for the view, if they're all floats.  Taking a view
might not work if var# have different types, like ints and floats.

If you want the mean of the rows (mean over the columns axis = 1)

Y[['var1','var2','var3']].view((float,3)).mean(1)

Some shortcuts.

Y[list(Y.dtype.names)].view((float,len(Y.dtype))).mean(1)

Also, for now, the columns will given back to you in the order they're
in in the array no matter which way you ask for them.  A patch has
been submitted for returning the order you ask that I hope gets picked
up...

Skipper