[Numpy-discussion] how to store variable length string in array and get single char by it's position?

Bruce Southey bsouthey at gmail.com
Tue Sep 14 14:02:10 EDT 2010


  On 09/14/2010 11:33 AM, Keith Goodman wrote:
> On Tue, Sep 14, 2010 at 9:25 AM, kee chen<keekychen.shared at gmail.com>  wrote:
>> Dear All,
>>
>> Suppose I have a list group some kind like DNA sequence:
>>
>> 1  ATGCATGCAATTGGCC
>> 2  ATGCATGCAATTGGCCATCD
>> 3  CATGCAATTGGCCCCCCCCC
>> ......
>> 100000 CATGCAAATTGGCCCCCCCCC
>>
>> the string length of each item is not sure and may get change/update later,
>> then how can I store above in a numpy array (include the ID) and easy to get
>> the single value?
>>
>> for example
>> 1.  ATGCATGCAATTGGCC
>> I want get the first T then I use something like array[1][1], means
>> A[T]G......   and if I want to update the 3rd postion I can use array[1][2]
>> = T to set the AT[G]C... to AT[T]C...?
> How about using a python list:
>
>>> array = ['ATGC', 'CATGA', 'A']
>>> array[0][1]
>     'T'
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
Variable length is incompatible with numpy:
'NumPy arrays have a fixed size at creation, unlike Python lists (which 
can grow dynamically).'
http://docs.scipy.org/doc/numpy/user/whatisnumpy.html

So you have to allocate the space for the largest sequence - although 
you can try using Scipy's sparse matrix (where nucleotides/amino acids 
are coded in numbers starting at 1 such that zeros code the empty areas.

If lists (or dictionaries) don't work for you, you might want to explore 
bioinformatics packages like 'pygr' (http://code.google.com/p/pygr/) and 
Biopython (http://biopython.org/wiki/Main_Page) or try more general 
approaches such hdf5 and pytables.

Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100914/e59df183/attachment.html>


More information about the NumPy-Discussion mailing list