[Numpy-discussion] recarray

Francesc Altet faltet at carabos.com
Mon Sep 18 06:17:03 EDT 2006


El dl 18 de 09 del 2006 a les 09:38 +0200, en/na Lionel Roubeyrie va
escriure:
> Le vendredi 15 septembre 2006 16:05, Francesc Altet a écrit :
> > Another possibility is to play with columns directly from the initial
> > recarray. The next is an example:
> >
> > In [101]: ra=numpy.rec.array("1"*36, dtype="a4,i4,f4", shape=3)
> > In [102]: ra
> > Out[102]:
> > recarray([('1111', 825307441, 2.5784852031307537e-09),
> >        ('1111', 825307441, 2.5784852031307537e-09),
> >        ('1111', 825307441, 2.5784852031307537e-09)],
> >       dtype=[('f0', '|S4'), ('f1', '<i4'), ('f2', '<f4')])
> > In [103]: rb=numpy.rec.fromarrays([numpy.array(ra['f0'], 'i4'),ra['f2']],
> > names='f0,f1')
> > In [104]: rb
> > Out[104]:
> > recarray([(1111, 2.5784852031307537e-09), (1111, 2.5784852031307537e-09),
> >        (1111, 2.5784852031307537e-09)],
> >       dtype=[('f0', '<i4'), ('f1', '<f4')])
> >
> > where ra is the original recarray and rb is a derived one where its first
> > column is the original from ra, but converted to integers ('i4'), and the
> > second it's the third column from ra (so the second column from ra has been
> > stripped out from rb).
> 
> I have a problem with that :
> lionel[ETD-2006-01__PM2.5_DALTON]334>datas[0:5]
>                          Sortie[334]:
> [['Dates ', 'PM10 ', 'c10', 'PM2.5  ', 'c2.5'],
>  ['05/01/2006', '33', 'A', '', 'N'],
>  ['06/01/2006', '41', 'A', '30', 'A'],
>  ['07/01/2006', '20', 'A', '16', 'A'],
>  ['08/01/2006', '16', 'A', '13', 'A']]
> 
> lionel[ETD-2006-01__PM2.5_DALTON]
> 335>ra=rec.array(datas[1:],formats='a10,i2,a1,i2,a1')
> 
> lionel[ETD-2006-01__PM2.5_DALTON]336>ra[0:5]
>                          Sortie[336]:
> recarray([[('05/01/2006', 0, '', 0, ''), ('33', 0, '', 0, ''),
>         ('A[9\xb4q\x00\x00\x00\xc0\xa3', -18448, '\xc0', -3933, '\xb7'),
>         ('30', 0, '', 0, ''),
>         ('N\x00\x00\x00\x00\x00\x00\x00t\xeb', -18496, '\x19', 13, '')],
>        [('06/01/2006', 0, '', 0, ''), ('41', 0, '', 0, ''),
>         ('A[9\xb4q\x00\x00\x00\xc0\xa3', -18448, '\xc0', -3933, '\xb7'),
>         ('30', 0, '', 0, ''),
>         ('A\x00\x00\x00\x00\x00\x00\x00t\xeb', -18496, '\x19', 13, '')],
>        [('07/01/2006', 0, '', 0, ''), ('20', 0, '', 0, ''),
>         ('A[9\xb4q\x00\x00\x00\xc0\xa3', -18448, '\xc0', -3933, '\xb7'),
>         ('16', 0, '', 0, ''),
>         ('A\x00\x00\x00\x00\x00\x00\x00t\xeb', -18496, '\x19', 13, '')],
>        [('08/01/2006', 0, '', 0, ''), ('16', 0, '', 0, ''),
>         ('A[9\xb4q\x00\x00\x00\xc0\xa3', -18448, '\xc0', -3933, '\xb7'),
>         ('13', 0, '', 0, ''),
>         ('A\x00\x00\x00\x00\x00\x00\x00t\xeb', -18496, '\x19', 13, '')],
>        [('09/01/2006', 0, '', 0, ''), ('18', 0, '', 0, ''),
>         ('A[9\xb4q\x00\x00\x00\xc0\xa3', -18448, '\xc0', -3933, '\xb7'),
>         ('15', 0, '', 0, ''),
>         ('A\x00\x00\x00\x00\x00\x00\x00t\xeb', -18496, '\x19', 13, '')]],
>       dtype=[('f1', '|S10'), ('f2', '<i2'), ('f3', '|S1'), ('f4', '<i2'), 
> ('f5', '|S1')])
> 
> I have some missing entries, is it for that or do I have to make some changes 
> on the date column?

You have two problems here. The first is that you shouldn't have missign
entries, or conversion from empty strings to ints (or whatever) will
fail:

>>> int('')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: invalid literal for int():

Second, you can't feed a string of literals directly into the rec.array
constructor (it is not as intelligent to digest this yet). You can
achieve what you want by first massaging the data a bit:

>>> ra=numpy.rec.array(datas[1:])
>>>
numpy.rec.fromarrays([ra['f1'],ra['f2'],ra['f3'],ra['f4'],ra['f5']],formats='a10,i2,a1,i2,a1')
recarray([('05/01/2006', 33, 'A', 0, 'N'), ('06/01/2006', 41, 'A', 30,
'A'),
       ('07/01/2006', 20, 'A', 16, 'A'), ('08/01/2006', 16, 'A', 13,
'A')],
      dtype=[('f1', '|S10'), ('f2', '<i2'), ('f3', '|S1'), ('f4',
'<i2'), ('f5', '|S1')])

or, a bit more easier,

>>> ca=numpy.array(datas[1:])
>>> numpy.rec.fromarrays(ca.transpose(),formats='a10,i2,a1,i2,a1')
recarray([('05/01/2006', 33, 'A', 0, 'N'), ('06/01/2006', 41, 'A', 30,
'A'),
       ('07/01/2006', 20, 'A', 16, 'A'), ('08/01/2006', 16, 'A', 13,
'A')],
      dtype=[('f1', '|S10'), ('f2', '<i2'), ('f3', '|S1'), ('f4',
'<i2'), ('f5', '|S1')])


Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"






More information about the NumPy-Discussion mailing list