[Numpy-discussion] Record arrays

Thu Jun 26 16:38:36 EDT 2008

> Let's be clear, there are two very closely related things: recarrays
> and record arrays. Record arrays are just ndarrays with a complicated
> dtype. E.g.
> 
> In [1]: from numpy import *
> 
> In [2]: ones(3, dtype=dtype([('foo', int), ('bar', float)]))
> Out[2]:
> array([(1, 1.0), (1, 1.0), (1, 1.0)],
>       dtype=[('foo', '<i4'), ('bar', '<f8')])
> 
> In [3]: r = _
> 
> In [4]: r['foo']
> Out[4]: array([1, 1, 1])
> 
> 
> recarray is a subclass of ndarray that just adds attribute access to
> record arrays.
> 
> In [10]: r2 = r.view(recarray)
> 
> In [11]: r2
> Out[11]:
> recarray([(1, 1.0), (1, 1.0), (1, 1.0)],
>       dtype=[('foo', '<i4'), ('bar', '<f8')])
> 
> In [12]: r2.foo
> Out[12]: array([1, 1, 1])
> 
> 
> One downside of this is that the attribute access feature slows down
> all field accesses, even the r['foo'] form, because it sticks a bunch
> of pure Python code in the middle. Much code won't notice this, but if
> you end up having to iterate over an array of records (as I have),
> this will be a hotspot for you.
> 
> Record arrays are fundamentally a part of numpy, and no one is even
> suggesting that they would go away. No one is seriously suggesting
> that we should remove recarray, but some of us hesitate to recommend
> its use over plain record arrays.
> 
> Does that clarify the discussion for you?
> 
Thanks! This has always been something that has confused me . . . This is
awesome, I guess I build by DataFrame object for nothing :-)

Gabriel