[Tutor] Need advise on protein sequence alignment code in python 2.7 giving wrong output

Белякова Анастасия nasbelyakova at gmail.com
Tue Aug 11 10:53:38 CEST 2015


I am trying to develop code to perform global sequence alignment in python
2.7 for proteins with gap penalty of -5 and blossom64 scoring matrix. Code
I have now give wrong output.

When I give 'PLEASANTLY' and 'MEANLY' as input, I get 'PLEASANTLY' and
'M-EAN--L-Y' as output instead of 'PLEASANTLY' and '-MEA--N-LY'. Blossom62
scoring matrix is given as dictionary {'A': {'A': 4, 'C': 0, etc. -5 is
insertion/deletion penalty.

Here is a code:

def scoring(v,w,blossom62):
   S = [[0]*(len(w)+1) for _ in xrange(len(v)+1)]
   for i in range(0,len(w)+1):
       S[0][i]=i*-5
   for j in range (0,len(v)+1):
       S[j][0]=j*-5
   for i in xrange(len(v)):
       for j in xrange(len(w)):
             S[i+1][j+1] =
max(S[i+1][j]-5,S[i][j+1]-5,S[i][j]+blossom62[v[i]][w[j]] )
   backtrack=[[0]*(len(w)) for _ in xrange(len(v))]
   for i in xrange(len(v)):
      for j in xrange(len(w)):
         if max(S[i][j-1],S[i-1][j],S[i-1][j-1]) == S[i-1][j]:
             backtrack[i][j]= 0
         elif max(S[i][j-1],S[i-1][j],S[i-1][j-1]) == S[i][j-1]:
             backtrack[i][j]= 1
         elif max(S[i][j-1],S[i-1][j],S[i-1][j-1]) == S[i-1][j-1]:
             backtrack[i][j]= 2
  def insert_indel(word,pos):
      return word[:pos]+'-'+word[pos:]
   v_aligned, w_aligned = v, w
   i,j =len(v)-1,len(w)-1
   while i*j!=0:
      if backtrack[i][j]==0:
          i-=1
          w_aligned=insert_indel(w_aligned,j)
      elif backtrack[i][j]==1:
          j-=1
          v_aligned=insert_indel(v_aligned,i)
      elif backtrack[i][j]==2:
          j-=1
          i-=1
   print v_aligned
   print w_aligned

What is wrong and how to fix it?


More information about the Tutor mailing list