[Tutor] FASTA FILE SUB-SEQUENCE EXTRACTION

syed zaidi syedzaidi85 at hotmail.co.uk
Tue Mar 8 07:19:47 EST 2016


Well, fasta is a file format used by biologists to store biological sequencesthe format is as under> sequence information (sequence name, sequence length etc)genomic sequence> sequence information (sequence name, sequence length etc)genomic sequenceI want to match the name of sequence with another list of sequence names and splice the sequence by the provided list of start and end sites for each sequenceso the pseudo code could beif line starts with '>':    match the header name with sequence name:        if sequence name found:            splice from the given start and end positions of that sequence            the code I have devised so far is:import oswith open('E:/scaftig.sample - Copy.scaftig','r') as f:    header = f.readline()    header = header.rstrip(os.linesep)    sequence = ''    for line in f:        line = line.rstrip('\n')        if line[0] == '>':            header = header[:]            print header                    if line[0] != '>':            sequence+= line            print sequence, len(sequence)I would appreciate if you can helpThanksBest RegardsAli
> Date: Tue, 8 Mar 2016 03:11:42 -0500
> Subject: Re: [Tutor] FASTA FILE SUB-SEQUENCE EXTRACTION
> From: wolfrage8765 at gmail.com
> To: syedzaidi85 at hotmail.co.uk
> 
> What is FASTA? This seems very specific. Do you have any code thus far
> that is failing?
> 
> On Tue, Mar 8, 2016 at 2:33 AM, syed zaidi <syedzaidi85 at hotmail.co.uk> wrote:
> > Hello all,
> > I am stuck in a problem, I hope someone can help me out. I have a FASTA file with multiple sequences and another file with the gene coordinates. SAMPLEFASTA FILE:
> >>EBM_revised_C2034_1  length=611GCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATT>EBM_revised_C2104_1  length=923TCCGAGGGCGGTGGGATGTTGGTGCTGCAGCGGCTTTCGGATGCGCGGCGGTTGGGTCATCCGGTGTTGGCGGTGGTGGTCGGGTCGGCGGTTAATCAGGATGGGGCGTCGAATGGGTTGACCGCGCCTAATGGTCCTTCGCAGCAGCGGGTGGTGCGGGCGGCGTTGGCCAATGCCGGGTTGAGCGCGGCCGAGGTGGATGTGGTGGAGGGGCATGGGACCGGGACCACGTTGGGGGATCCGATTGAGGCTCAGGCGTTGTTGGCCACTTATGGGCAAGATCGGGGGGAGCCGGGAGAACCTTTGTGGTTGGGGTCGGTGAA
> >  GTCGAATATGGGTCATACGCAGGCCGCGGCGGGGGTGGCCGGGGTGATCAAGATGGTGTTGGCGATGCGCCATGAGCTGTTGCCGGCGACGTTGCACGTGGATGTGCCTAGCCCGCATGTGGATTGGTCGGCGGGGGCGGTGGAGTTGTTGACCGCGCCGCGGGTGTGGCCTGCTGGTGCTCGGACGCGTCGTGCGGGGGTGTCGTCGTTTGGGATTAGTGGCACTAATGCGCATGTGATTATCGAGGCGGTGCCGGTGGTGCCGCGGCGGGAGGCTGGTTGGGCGGGGCCGGTGGTGCCGTGGGTGGTGTCGGCGAAGTCGGAGTCGGCGTTGCGGGGGCAGGCGGCTCGGTTGGCCGCGTACGTGCGTGGCGATGATGGCCTCGATGTTGCCGATGTGGGGTGGTCGTTGGCGGGTCGTTCGGTTTTTGAGCATCGGGCGGTGGTGGTTGGCGGGGACCGTGATCGGTTGTTGGCCGGGCTCGATGAGCTGGCGGGTGACCAGTTGGGCGGCTCGGTTGTTCGGGGCACGGCGACTGCGGCGGGTAAGACGGTGTTCGTCTTCCCCGGCCAAGGCTCCCAATGGCTGGGCATGGGAAT
> > GENE COORD FILEScaf_name        Gene_name       DS_St   DS_EnEBM_revised_C2034_1        gene1_1 33      99EBM_revised_C2034_1   gene1_1 55      100EBM_revised_C2034_1  gene1_1 111     150EBM_revised_C2104_1  gene1_1 44      70
> > I want to perform the following steps:compare the scaf_name with the header of fasta sequenceif header matches then process the sequence and extract the sequence by the provided start and end positions.
> >
> > I would appreciate if someone can help
> > Thanks
> > Best Regards
> >
> > Ali
> >
> >> _______________________________________________
> >> Tutor maillist  -  Tutor at python.org
> >> To unsubscribe or change subscription options:
> >> https://mail.python.org/mailman/listinfo/tutor
> >
> > _______________________________________________
> > Tutor maillist  -  Tutor at python.org
> > To unsubscribe or change subscription options:
> > https://mail.python.org/mailman/listinfo/tutor
 		 	   		  


More information about the Tutor mailing list