[Tutor] FASTA parsing, biological sequence analysis

Sydney Shall s.shall at virginmedia.com
Tue Mar 25 16:36:12 CET 2014


I did not know about biopython, but then I am a debutant.
I tried to import biopython and I get the message that the name is unknown.
I am using Enthought Python 2.7.3on MAC OS X10.6.8.
Where and more importantly for me, how do I find and import BioPython on 
to my machine.
With many thanks,
Sydney


On 24/03/2014 18:58, Danny Yoo wrote:
> Hi Jumana,
>
> Following up.  Let's change the subject line.  This makes it much
> easier for folks to see that this is a new topic of conversation.
>
>
> [Apologies to the others on the list for my last reply: I didn't
> realize that the subject was wrong, as well as the long quoted digest.
>   I'll try to be more careful next time.]
>
>
> Jumana, I would strongly suggest separating string parsing issues from
> computational issues.  The suggestion to use Biopython is twofold: not
> only do you get to avoid writing a FASTA parser, but it gets you in
> the right mindset of processing _multiple_ sequences.
>
>
> You are encountering this problem, as your comment suggests:
>
>> I wrote a program close to what Denis suggested , however it works only if I
>> have one sequence (one header and one sequence), I can not modify it to work
>> if I have several sequences (like above).
>
> You want the structure of your program to do an analysis on each
> biological sequence, rather than on just on each character of your
> sequence.
>
> ###
> ### pseudocode below: #
> ###
> from Bio import SeqIO
> import sys
>
>
> def doAnalysis(record):
>      print("I see: ", record.id, record.seq)
>      ## fill me in
>
>
> for record in SeqIO.parse(sys.stdin, 'fasta'):
>      doAnalysis(record)
> ###
>
>
> And you can fill in the details of doAnalysis() so that it does the
> nucleotide counting and only needs to worry about the contents of the
> record's single sequence.
>
> In bioinformatics contexts, you must either deal with memory
> consumption, or use libraries that naturally lend to doing things in a
> memory-careful way, or else your computer will start swapping RAM.  At
> least, unless your data sets are trivial, which I am guessing is not
> the case.
>
> In short, please use the BioPython library.  It will handle a lot of
> issues that you are not considering, including memory consumption and
> correct, stream-oriented parsing of FASTA.
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>


-- 
Sydney Shall


More information about the Tutor mailing list