[Numpy-discussion] genfromtxt view with object dtype

Wed Feb 4 23:51:34 EST 2009

OK, Brent, try r6341.
I fixed genfromtxt for cases like yours (explicit dtype involving a  
np.object).
Note that the fix won't work if the dtype is nested and involves  
np.objects (as we would hit the pb of renaming fields we observed...).
Let me know how it goes.
P.

On Feb 4, 2009, at 4:03 PM, Brent Pedersen wrote:

> On Wed, Feb 4, 2009 at 9:36 AM, Pierre GM <pgmdevlist at gmail.com>  
> wrote:
>>
>> On Feb 4, 2009, at 12:09 PM, Brent Pedersen wrote:
>>
>>> hi, i am using genfromtxt, with a dtype like this:
>>> [('seqid', '|S24'), ('source', '|S16'), ('type', '|S16'), ('start',
>>> '<i4'), ('end', '<i4'), ('score', '<f8'), ('strand', '|S1'),  
>>> ('phase',
>>> '<i4'), ('attrs', '|O4')]
>>
>> Brent,
>> Please post a simple, self-contained example with a few lines of the
>> file you want to load.
>>
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion at scipy.org
>> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> hi pierre, here is an example.
> thanks,
> -brent
>
> ######################
>
> import numpy as np
> from cStringIO import StringIO
>
> gffstr = """\
> ##gff-version 3
> 1\tucb\tgene\t2234602\t2234702\t.\t-\t. 
> \tID 
> = 
> grape_1_2234602_2234702 
> ;match 
> = 
> EVM_prediction_supercontig_1.248,EVM_prediction_supercontig_1.248.mRNA
> 1\tucb\tgene\t2300292\t2302123\t.\t+\t. 
> \tID=grape_1_2300292_2302123;match=EVM_prediction_supercontig_244.8
> 1\tucb\tgene\t2303615\t2303967\t.\t+\t. 
> \tID=grape_1_2303615_2303967;match=EVM_prediction_supercontig_244.8
> 1\tucb\tgene\t2303616\t2303966\t.\t+\t. 
> \tParent=grape_1_2303615_2303967
> 1\tucb\tgene\t3596400\t3596503\t.\t-\t. 
> \tID=grape_1_3596400_3596503;match=evm.TU.supercontig_167.27
> 1\tucb\tgene\t3600651\t3600977\t.\t-\t. 
> \tmatch=evm.model.supercontig_1217.1,evm.model.supercontig_1217.1.mRNA
> """
>
> dtype = {'names' :
>                  ('seqid', 'source', 'type', 'start', 'end',
>                    'score', 'strand', 'phase', 'attrs') ,
>        'formats':
>                  ['S24', 'S16', 'S16', 'i4', 'i4', 'f8',
>                      'S1', 'i4', 'S128']}
>
> #OK with S128 for attrs
> print np.genfromtxt(StringIO(gffstr), dtype = dtype)
>
>
>
> def _attr(kvstr):
>    pairs = [kv.split("=") for kv in kvstr.split(";")]
>    return dict(pairs)
>
> # change S128 to object to have col attrs as dictionary
> dtype['formats'][-1] = 'O'
> converters = {8: _attr }
> #NOT OK
> print np.genfromtxt(StringIO(gffstr), dtype = dtype,  
> converters=converters)
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion