Python module for DNA to amino acid and reverse complement translation.

Andrew Dalke dalke at acm.org
Sun Sep 3 01:24:42 EDT 2000


Alex wrote:
>Hi.  Here is a python wrapper around some simple C functions that
>translate DNA sequences into amino acid residues and give their reverse
>complements

>No doubt there is already a module out there that does these things.
>Apologies to whoever's work I'm duplicating.

One of those places is biopython.org.  You might be interested in
seeing what were doing.

>>> from Bio import Seq, utils
>>> from Bio.Alphabet import IUPAC
>>> seq = Seq.Seq("ATATGTACTCCCATGGGGACAAATATCCTTCTGAGGGGCCACAGTCATCAC", \
>>>               IUPAC.unambiguous_dna)
>>> utils.translate(seq)
Seq('ICTPMGTNILLRGHSHH', IUPACProtein())

It's still pretty new code, and missing some parts.  For example, I
just realized the data tables are present for revcomp, but it isn't
implemented in the core routines, though Thomas has one for the
xbbtools.  Silly us!  On the other hand, it does support different
codon tables.

> I guess it's something like 10 times faster than the pure
> python versions, but I haven't done any benchmarking.

Translating a megabase of 'A's with biopython takes 5.6 seconds.

Your pure Python '_translate' takes 6.5 seconds.  There are a few
optimization tricks you aren't using - like making everything a
local lookup  Here's the inner loop for biopython:

    for i in range(0, n-n%3, 3):
        append(get(s[i:i+3], stop_symbol))

The C function takes 0.1 seconds, so (sadly) C a factor of about
60 faster.

Huh.  There was a discussion a few months ago in c.l.py in that
Python grows lists linearly, instead of using a scaling factor.  That
makes some algorithms O(n**2) instead of O(n).  So I tried
preallocating the size, which also makes for a single malloc, and
the time went down to 4.3 seconds.  So C is only 40 times faster for
this case, which makes sense because throwing bytes around is
something C does very well and it doesn't need to do the string
subslicing.

                    Andrew
                    dalke at acm.org






More information about the Python-list mailing list