[Numpy-discussion] Record arrays
Gabriel Gellner
ggellner at uoguelph.ca
Thu Jun 26 16:38:36 EDT 2008
> Let's be clear, there are two very closely related things: recarrays
> and record arrays. Record arrays are just ndarrays with a complicated
> dtype. E.g.
>
> In [1]: from numpy import *
>
> In [2]: ones(3, dtype=dtype([('foo', int), ('bar', float)]))
> Out[2]:
> array([(1, 1.0), (1, 1.0), (1, 1.0)],
> dtype=[('foo', '<i4'), ('bar', '<f8')])
>
> In [3]: r = _
>
> In [4]: r['foo']
> Out[4]: array([1, 1, 1])
>
>
> recarray is a subclass of ndarray that just adds attribute access to
> record arrays.
>
> In [10]: r2 = r.view(recarray)
>
> In [11]: r2
> Out[11]:
> recarray([(1, 1.0), (1, 1.0), (1, 1.0)],
> dtype=[('foo', '<i4'), ('bar', '<f8')])
>
> In [12]: r2.foo
> Out[12]: array([1, 1, 1])
>
>
> One downside of this is that the attribute access feature slows down
> all field accesses, even the r['foo'] form, because it sticks a bunch
> of pure Python code in the middle. Much code won't notice this, but if
> you end up having to iterate over an array of records (as I have),
> this will be a hotspot for you.
>
> Record arrays are fundamentally a part of numpy, and no one is even
> suggesting that they would go away. No one is seriously suggesting
> that we should remove recarray, but some of us hesitate to recommend
> its use over plain record arrays.
>
> Does that clarify the discussion for you?
>
Thanks! This has always been something that has confused me . . . This is
awesome, I guess I build by DataFrame object for nothing :-)
Gabriel
More information about the NumPy-Discussion
mailing list