processing the genetic code with python?

nuttydevil sjw28 at sussex.ac.uk
Mon Mar 6 10:03:24 EST 2006


I have many notepad documents that all contain long chunks of genetic
code. They look something like this:

atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacag
tacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaa
agcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgt
ggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaa

Basically, I want to design a program using python that can open and
read these documents. However, I want them to be read 3 base pairs at a
time (to analyse them codon by codon) and find the value that each
codon has a value assigned to it. An example of this is below:

** If the three base pairs were UUU the value assigned to it (from the
codon value table) would be 0.296

The program has to read all the sequence three pairs at a time, then I
want to get all the values for each codon, multiply them together and
put them to the power of 1 / the length of the sequence in codons
(which is the length of the whole sequence divided by three).

However, to make things even more complicated, the notebook sequences
are in lowercase and the codon value table is in uppercase, so the
sequences need to be converted into uppercase. Also, the Ts in the DNA
sequences need to be changed to Us (again to match the codon value
table). And finally, before the DNA sequences are read and analysed I
need to remove the first 50 codons (i.e. the first 150 letters) and the
last 20 codons (the last 60 letters) from the DNA sequence. I've also
been having problems ensuring the program reads ALL the sequence 3
letters at a time.

I've tried various ways of doing this but keep coming unstuck along the
way. Has anyone got any suggestions for how they would tackle this
problem?
Thanks for any help recieved!




More information about the Python-list mailing list