[Numpy-discussion] Automatic string length in recarray

Wed Nov 4 12:38:17 EST 2009

On Tue, Nov 3, 2009 at 11:43 AM, David Warde-Farley <dwf at cs.toronto.edu>wrote:

> On 2-Nov-09, at 11:35 PM, Thomas Robitaille wrote:
>
> > But if I want to specify the data types:
> >
> > np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8),
> > ('b',np.str)])
> >
> > the string field is set to a length of zero:
> >
> > rec.array([(1, ''), (2, '')], dtype=[('a', '|i1'), ('b', '|S0')])
> >
> > I need to specify datatypes for all numerical types since I care about
> > int8/16/32, etc, but I would like to benefit from the auto string
> > length detection that works if I don't specify datatypes. I tried
> > replacing np.str by None but no luck. I know I can specify '|S5' for
> > example, but I don't know in advance what the string length should be
> > set to.
>
> This is a limitation of the way the dtype code works, and AFAIK
> there's no easy fix. In some code I wrote recently I had to loop
> through the entire list of records i.e. max(len(foo[2]) for foo in
> records).
>
>
Not to shamelessly plug my own project ... but more robust string type
detection is one of the features  of Tabular (
http://bitbucket.org/elaine/tabular/), and is one of the (kinds of) reasons
we wrote the package.  Perhaps using Tabular could be useful to you?

Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091104/91c5a063/attachment.html>