[Tutor] Phyton script for fasta file (seek help)

Peter Otten __peter__ at web.de
Sun Mar 24 20:24:48 CET 2013


michelle_low wrote:

> Hi Peter,
> Thanks for your help.

You're welcome. In the future, please send your mail to the list, not to an 
individual poster. That way more people get a chance to read it and you are 
more likely to get help.

> I have followed your advice:
> 
> 
> -removed the line
> 
> from sets import Set
> 
> completely and replace Set with set
> 
> 
> -Replace the line
> 
> > line = file("sequence.fna", "r")
> 
> with
> 
> import sys
> line = sys.stdin

> However, when I tried to run the script using command ./script.py <
> sequence.fna . nothing came up. The code just does't run. I used to be 
able
> to execute the code and it printed out the list. Now I'm clueless and 
stuck again.

If it worked to your satisfaction before I suggest that you doublecheck you 
didn't accidentally make changes other than those I suggested.

> Can you please look at my script below? Thanks a million

I'd rather not ;) Seriously, without a clear notion of what the script is 
supposed to do it is hard to fix its behaviour...

> #!/usr/local/bin/python
> 
> 
> import math
> import sys
> 
> 
> line = sys.stdin
> 
> 
> for x in line:
>   if x [0] == ">" :

I suppose that's a ">" in your actual code. Otherwise the condition is 
always False -- which would explain why you see nothing.
 
> #determine the length of sequences
>     s=line.next()
>     s=s.rstrip()
>     length = len(s)
> 
> 
> # determine the GC content
>     G = s.count('G')
>     C = s.count('C')
>     GC= 100 * (float(G + C) / length)
> 
> 
> 
> 
>     stList = list(s)
>     alphabet = list(set(stList))
> 
> 
>     freqList = []
>     for symbol in alphabet:
>       ctr = 0
>       for sym in stList:
>         if sym == symbol:
>             ctr += 1
> 
>    freqList.append(float(ctr)/len(stList))

The above line gave me an IndentationError. It should probably be

>       freqList.append(float(ctr)/len(stList))

Had you answered Alan (publicly) there'd be no need to guess...

> # Shannon entropy
>   ent = 0.0
>   for freq in freqList:
>     ent = ent + freq * math.log(freq, 2)
>   ent = -ent
> 
> 
>   print x
>   print "Length:" , length
>   print "G+C:" ,round(GC),"%"
>   print 'Shannon entropy:'
>   print ent
>   print 'Minimum number of bits required to encode each symbol:'
>   print int(math.ceil(ent))

PS: if you reuse code from someone else: 
<http://code.activestate.com/recipes/577476-shannon-entropy-calculation/>
it's a good idea to give credits.



More information about the Tutor mailing list