[Numpy-discussion] indexing recarrays

Mon Jul 20 12:05:16 EDT 2009

On Jul 20, 2009, at 7:54 AM, John [H2O] wrote:

> I have a file containing mixed data types: strings, floats, datetime  
> output(i.e. strings), and ints. Something like:
> #ID, name, date, value 1,sample,2008-07-10 12:34:20,344.56
> Presuming I get them nicely into a recarray (see my other post) then  
> I would like to work with them using the convention that I have  
> become used to:
>
> D[:,0] #means D for all rows, column 0
>
> But it seems now I have to use:
>
> D['columnID']
>
> Am I missing something, or is there a way to keep referencing  
> columns by the column integer index??
>
Check the shape of your array: it should be (N,), where N is the  
number of lines. That should let you know that your array is 1D,  
without any columns. What you think as columns is actually a field,  
and you must access it as D['columnID'].
Now, it is possible to have a 2D structured array, like this one.
 >>> x=array([[(1, 1.0), (2, 2.0)],
              [(3, 3.0), (4, 4.0)]],
             dtype=[('f0', '<i4'), ('f1', '<f8')])
In that case, you can access the first column:
 >>> x[:,0]
     array([(1, 1.0), (3, 3.0)],
            dtype=[('f0', '<i4'), ('f1', '<f8')])
> On another note... are recarrays and structured arrays the same??
>
Not quite. a recarray is a special structured array  that lets you  
access fields as attributes (like in D.columnID) as well as items  
(like in D['columnID']). Because of this added functionality,  
recarrays are a bit slower to manipulate.