[Numpy-discussion] Automatic string length in recarray

Pierre GM pgmdevlist at gmail.com
Tue Nov 3 12:40:00 EST 2009


On Nov 3, 2009, at 11:43 AM, David Warde-Farley wrote:

> On 2-Nov-09, at 11:35 PM, Thomas Robitaille wrote:
>
>> But if I want to specify the data types:
>>
>> np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8),
>> ('b',np.str)])
>>
>> the string field is set to a length of zero:
>>
>> rec.array([(1, ''), (2, '')], dtype=[('a', '|i1'), ('b', '|S0')])
>>
>> I need to specify datatypes for all numerical types since I care  
>> about
>> int8/16/32, etc, but I would like to benefit from the auto string
>> length detection that works if I don't specify datatypes. I tried
>> replacing np.str by None but no luck. I know I can specify '|S5' for
>> example, but I don't know in advance what the string length should be
>> set to.
>
> This is a limitation of the way the dtype code works, and AFAIK
> there's no easy fix. In some code I wrote recently I had to loop
> through the entire list of records i.e. max(len(foo[2]) for foo in
> records).

As a workwaround, perhaps you could use np.object instead of np.str  
while defining your array. You can then get the maximum string length  
by looping, as David suggested, and then use .astype to transform your  
array...




More information about the NumPy-Discussion mailing list