comapring 2 sequences of DNA ouput the silent and non mutations

Sat Oct 29 19:48:38 EDT 2016

On 2016-10-29 20:38, dishaacharya96 at gmail.com wrote:
> Code:
>
> A = 0
> B= 0
> i=0
> j=0
> # opening the files
> infile1 = open("CDSsrebf1.txt")
> infile2 = open("PROsrebf1.txt")
> infile3 = open("mutant.txt")
> print(" 1st line of WT SREBF1 (CDS):",infile1.readline())
> print ("1st line of mutant protein of SREBF1: ", infile3.readline())
> print ("1st line of protein of SREBF1: ",infile2.readline())
> # -----------------------------------------------------
> # reading the nucleotide sequence for WT SREBF1
> seq1 = infile1.read()
> seq1 = seq1.replace('\n', '')
> len1 = len(seq1)
> # --------------------------------------------
> # reading the mutant file
> mutant = infile3.read()
> mutant = mutant.replace('\n', '')
> #---------------------------------------
> # reading the protein file
> # which is used to check our codon dictionary
> wtPRO = infile2.read()
> wtPRO = wtPRO.replace('\n', '')
> #---------------------------------------------------------
> # setting up the dictionary
> letters = ('G', 'A', 'C', 'T')
> codes = []
> for a in letters :
>     for b in letters :
>         for c in letters :
>             codes.append(a + b + c)
> aa = 'ggggeeddaaaavvvvrrsskknnttttmiiirrrrqqhhppppllllwxccxxyyssssllff'
> aa = aa.upper()
> codons = {}
> for i in range(64) :
>     codons[codes[i]] = aa[i]
> #------------------------------------------------------------------
> # making the protein from the WT SREBF1, which is seq1
> protein = ''
> for i in range(0, len(seq1), 3) :
>     codon = seq1[i:i+3]
>     aminoacid = codons[codon]
>     protein += aminoacid
> # -----------------------------------------------------------
> # making the protein from the mutant SREBF1, which is mutant
> mutantPRO = ''
> for i in range(0, len(mutant), 3) :
>     codon = mutant[i:i+3]
>     aminoacid = codons[codon]
>     mutantPRO += aminoacid
> # ----------------------------------------------------------
> # quick check if WT and mutant are the same for the protein
> if protein == mutantPRO :
>     print ('The protein sequences are the same.')
> else :
>     print ('The protein sequences are different.')
> # --------------------------------------------------------
> # Printing the differences in the format XiY
> # which means WT amino acid X at position i changed to mutant amino acid Y
> print ('-------------------------')
> print ('The mutations are:')
>
> for i in range (len(protein) & len(seq1)) :
>
>         if protein[i] != mutantPRO[i] :
>            print (protein[i] + str(i) + mutantPRO[i])
>            A+= 1
>         else:
>                 if seq1[i:i+3] != mutant[i:i+3]:
>                          print(protein[i] + str(i) + mutantPRO[i] +' Silent mutation ')
>                          print(seq1[i:i+3] + mutant[i:i+3])
>                          B+= 1
>
>
> print("Number of non-silent mutations are: ",A)
> print("Number of silent mutations are: " , B)
>
>
> output
>
> should be The mutations are:
> M0I
> D1D silent mutation C5T
> V291L
>
>
> I dont know what to print the C5T part
>
> Thank you for helping me!
>
I don't understand what the expression "len(protein) & len(seq1)" is 
supposed to be doing. It's a bitwise 'and' of the lengths, which just 
looks wrong!