[SciPy-user] Record Array: How to add a column?

John Hunter jdh2358 at gmail.com
Tue Oct 14 12:55:25 EDT 2008


On Tue, Oct 14, 2008 at 11:28 AM, Pierre GM <pgmdevlist at gmail.com> wrote:
> John,
> Do you plan to have your modifications part of numpy.records ? In any case,
> I'll try to check whether it is easy to add support to missing data:
> MaskedArrays should now support with flexible-types.

I do not have concrete plans, but I have spoken with Jarrod about
moving some of these over, making some of them record array methods,
others available in the np.rec namespace.  I think the consensus is
that these are useful and belong in numpy, but we are awaiting someone
to do the port.

On the subject of masked record arrays.  We added masked array support
to mlab.csv2rec some time ago and it has caused no shortage of
headaches because of differences in the interface for objects for
masked record arrays and regular recarrays.

The following example shows a record array with a 'date' column which
is a O4 python object type. Here is the behavior of the recarray

  In [212]: !cat test1.csv
  date,age,name
  2008-01-01,10,'tom'
  2008-01-02,11,'dick'
  2008-01-03,12,'harry'
  In [213]: r1 = mlab.csv2rec('test1.csv')

  In [214]: type(r1)
  Out[214]: <class 'numpy.core.records.recarray'>

  In [215]: r1.dtype
  Out[215]: dtype([('date', '|O4'), ('age', '<i4'), ('name', '|S7')])

  In [216]: print r1[0].date.year
  2008

In particular, on a given row of the recarray, I can call object
methods and access object attributes.

In the next example, the data file has a missing value on the last row
in the 'age' column, so we return a masked record array

  In [217]: !cat test2.csv
  date,age,name
  2008-01-01,10,'tom'
  2008-01-02,11,'dick'
  2008-01-03,,'harry'
  In [218]: type(r2)
  Out[218]: <class 'numpy.ma.mrecords.MaskedRecords'>

  In [219]: print r2.dtype
  [('date', '|O4'), ('age', '<i4'), ('name', '|S7')]

  In [220]: r2[0].date.year
  ------------------------------------------------------------
  Traceback (most recent call last):
    File "<ipython console>", line 1, in ?
  AttributeError: 'MaskedArray' object has no attribute 'year'

It would help us a lot in this regard if we could access the
underlying object.  Is there a reason why the masked array behaves
differently when it comes to accessing the underlying object methods
and is there a sensible way to make them compatible?

Thanks,
JDH



More information about the SciPy-User mailing list