[Tutor] Analysing genetic code (DNA) using python

Mon Mar 6 20:14:51 CET 2006

On 3/6/06, sjw28 <brainy_muppet at hotmail.com> wrote:
>
>
> I have many notepad documents that all contain long chunks of genetic
> code. They look something like this:
>
> atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacag
> tacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaa
> agcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgt
> ggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaa
>
>
> Basically, I want to design a program using python that can open and
> read these documents. However, I want them to be read 3 base pairs at a
> time (to analyse them codon by codon) and find the value that each
> codon has a value assigned to it. An example of this is below:
>
>
> ** If the three base pairs were UUU the value assigned to it (from the
> codon value table) would be 0.296
>
>
> The program has to read all the sequence three pairs at a time, then I
> want to get all the values for each codon, multiply them together and
> put them to the power of 1 / the length of the sequence in codons
> (which is the length of the whole sequence divided by three).
>
>
> However, to make things even more complicated, the notebook sequences
> are in lowercase and the codon value table is in uppercase, so the
> sequences need to be converted into uppercase. Also, the Ts in the DNA
> sequences need to be changed to Us (again to match the codon value
> table). And finally, before the DNA sequences are read and analysed I
> need to remove the first 50 codons (i.e. the first 150 letters) and the
> last 20 codons (the last 60 letters) from the DNA sequence. I've also
> been having problems ensuring the program reads ALL the sequence 3
> letters at a time.
>
>
> I've tried various ways of doing this but keep coming unstuck along the
> way. Has anyone got any suggestions for how they would tackle this
> problem?
> Thanks for any help recieved!

You've got a lot of pieces to your puzzle.

I would use  f.read() to read all of the file in, then a list comprehension
so you get only the codon characters (leaving out the newlines).

A simple slicing of the list can give you each codon.
something like might get you started:

f = open('codons.txt', 'r')

s = f.read()
l = [c for c in s if c != '\n']
r = len(l)

for x in range(0,r,3):
    y = x+3
    codon = l[x:y]
    print codon

f.close()

Use ''.join() to make them back into a string. From there, you could do a
lookup of the codon string in a dictionary. use the string method s.upper()
to uppercase your codon string.

Basically, figure out one problem at a time. Once that works, tackle the
next problem. Or, use something someone else already wrote for you, like
Kent suggests.

cordially,
Anna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20060306/fd70e5b1/attachment.htm