[Numpy-discussion] Automatic string length in recarray
Pierre GM
pgmdevlist at gmail.com
Tue Nov 3 12:40:00 EST 2009
On Nov 3, 2009, at 11:43 AM, David Warde-Farley wrote:
> On 2-Nov-09, at 11:35 PM, Thomas Robitaille wrote:
>
>> But if I want to specify the data types:
>>
>> np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8),
>> ('b',np.str)])
>>
>> the string field is set to a length of zero:
>>
>> rec.array([(1, ''), (2, '')], dtype=[('a', '|i1'), ('b', '|S0')])
>>
>> I need to specify datatypes for all numerical types since I care
>> about
>> int8/16/32, etc, but I would like to benefit from the auto string
>> length detection that works if I don't specify datatypes. I tried
>> replacing np.str by None but no luck. I know I can specify '|S5' for
>> example, but I don't know in advance what the string length should be
>> set to.
>
> This is a limitation of the way the dtype code works, and AFAIK
> there's no easy fix. In some code I wrote recently I had to loop
> through the entire list of records i.e. max(len(foo[2]) for foo in
> records).
As a workwaround, perhaps you could use np.object instead of np.str
while defining your array. You can then get the maximum string length
by looping, as David suggested, and then use .astype to transform your
array...
More information about the NumPy-Discussion
mailing list