Python module for DNA to amino acid and reverse complement translation.

David Mathog mathog at seqaxp.bio.caltech.edu
Tue Sep 5 23:39:43 EDT 2000


In article <etdd7ioult0.fsf at w20-575-31.mit.edu>, Alex <cut_me_out at hotmail.com> writes:
>
>
>
>
>Hi.  Here is a python wrapper around some simple C functions that
>translate DNA sequences into amino acid residues and give their reverse
>complements.  I guess it's something like 10 times faster than the pure
>python versions, but I haven't done any benchmarking.
>
>
>No doubt there is already a module out there that does these things.
>Apologies to whoever's work I'm duplicating.
>

Just for kicks you might want to compare this one with yours (for
speed):

  http://seqaxp.bio.caltech.edu/pub/SOFTWARE/FASTTRANS.C


I use this to translate genpept or nr on the fly into either full
translated frames or the set of ORF's greater than (for instance) 75 AA.)
It will handle a single DBA sequence entry up to 1 MB (use another program 
in the pipe to fragment larger sequences if you expect to encounter them.)


$ fasttrans
 usage (UNIX):     fasttrans 123456 [minAA] <in.nfa >out.pfa
 usage (OpenVMS):  pipe fasttrans 123456 [minAA] <in.nfa >out.pfa
    input is a fasta dna sequence via stdin
    output is the translated protein sequence via stdout
    Specify the set of frames to translate on command line
       1,2,3 are the 3 forward frames
       4,5,6 are the 3 reverse frames
    minAA is an optional value.  If present and greater than zero
       it emits each ORF that has at least that many AA residues
       in it as a separate fasta fragment.  If not present or set to zero
       or less the entire translated frame is emitted

This is ANSII C code and compiles cleanly on VMS and Linux (and probably 
everywhere else.)  

Regards,

David Mathog
mathog at seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech 





More information about the Python-list mailing list