[Numpy-discussion] structured arrays, recarrays, and record arrays

Allan Haldane allanhaldane at gmail.com
Sun Jan 18 23:36:50 EST 2015


Hello all,

Documentation of recarrays is poor and I'd like to improve it. In order 
to do this I've been looking at core/records.py, and I would appreciate 
some feedback on my plan.

Let me start by describing what I see. In the docs there is some 
confusion about 'structured arrays' vs 'record arrays' vs 'recarrays' - 
the docs use them often interchangeably. They also refer to structured 
dtypes alternately as 'struct data types', 'record data types' or simply 
'records' (eg, see the reference/arrays.dtypes and 
reference/arrays.indexing doc pages).

But by my reading of the code there are really three (or four) distinct 
types of arrays with structure. Here's a possible nomenclature:
  * "Structured arrays" are simply ndarrays with structured dtypes. That
    is, the data type is subdivided into fields of different type.
  * "recarrays" are a subclass of ndarrays that allow access to the
    fields by attribute.
  * "Record arrays" are recarrays where the elements have additionally
    been converted to 'numpy.core.records.record' type such that each
    data element is an object with field attributes.
  * (it is also possible to create arrays with dtype.dtype of
    numpy.core.records.record, but which are not recarrays. However I
    have never seen this done.)

Here's code demonstrating the creation of the different types of array 
(in order: structured array, recarray, ???, record array).

     >>> arr = np.array([(1,'a'), (2,'b')],
                        dtype=[('foo', int), ('bar', 'S1')])
     >>> recarr = arr.view(type=np.recarray)
     >>> noname = arr.view(dtype=dtype(np.record, arr.dtype))
     >>> recordarr = arr.view(dtype=dtype((np.record, arr.dtype)),
                              type=np.recarray)

     >>> type(arr), arr.dtype.type
         (numpy.ndarray, numpy.void)
     >>> type(recarr), recarr.dtype.type
         (numpy.core.records.recarray, numpy.void)
     >>> type(recordarr), recordarr.dtype.type
         (numpy.core.records.recarray, numpy.core.records.record)

Note that the functions numpy.rec.array, numpy.rec.fromrecords, 
numpy.rec.fromarrays, and np.recarray.__new__ create record arrays. 
However, in the docs you can see examples of the creation of recarrays, 
eg in the recarray and ndarray.view doctrings and in 
http://www.scipy.org/Cookbook/Recarray. The files 
numpy/lib/recfunctions.py and numpy/lib/npyio.py (and possibly masked 
arrays, but I haven't looked yet) make extensive use of recarrays (but 
not record arrays).

The main functional difference between recarrays and record arrays is 
field access on individual elements:

     >>> recordarr[0].foo
     1
     >>> recarr[0].foo
     Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
     AttributeError: 'numpy.void' object has no attribute 'foo'

Also, note that recarrays have a small performance penalty relative to 
structured arrays, and record arrays have another one relative to 
recarrays because of the additional python logic.

So my first goal in updating the docs is to use the right terms in the 
right place. In almost all cases, references to 'records' (eg 'record 
types') should be replaced with 'structured' (eg 'structured types'), 
with the exception of docs that deal specifically with record arrays. 
It's my guess that in the distant past structured datatypes were 
intended to always be of type numpy.core.records.record (thus the 
description in reference/arrays.dtypes) but that 
numpy.core.records.record became generally obsolete without updates to 
the docs. doc/records.rst.txt seems to document the transition.

I've made a preliminary pass of the docs, which you can see here
https://github.com/ahaldane/numpy/commit/d87633b228dabee2ddfe75d1ee9e41ba7039e715
Mostly I renamed 'record type' to 'structured type', and added a very 
rough draft to numpy/doc/structured_arrays.py.

I would love to hear from those more knowledgeable than myself on 
whether this works!

Cheers,
Allan



More information about the NumPy-Discussion mailing list