Just like in our DNA...

Philip Lijnzaad lijnzaad at ebi.ac.uk
Tue Oct 5 17:45:20 EDT 1999


> an advantage or a disadvantage to their host organism, the most likely
> outcome is that they simply continue to be propagated, mostly
> unchanged.  

yup, this is called genetic drift, and has been likened (I have no refs) to
brownian motion in sequence space. 

> It seems to me
> (but I am not a biologist!) that it would be extremely unlikely, that
> without a "Divine Planner", DNA strands would evolve in a way so as to
> be devoid of "Junk" sections.... that's just asking too much, to ask
> not just for a molecule that "does the job" of carrying the genetic
> information, but that does it with 100% efficiency.  There's not
> sufficient selective pressure for the DNA to be forced to be efficient
> (as opposed to, for example, the sunlight-collecting efficiency of a
> leaf, where efficiency matters).  The code-bases are small, molecules
> are cheap, and there's plenty of room for some extra unused bits in
> there.

I'm not going into religuous debates here, but current understanding is that
this is very achievable. For small bacteria and especially viruses, the junk
is very costly indeed, and getting rid it offers a selective advantage (and
they they have little or no junk DNA). 

What's more (and this I think is exceeddingly interesting and has been
pointed out many times before): some viruses (I thnk phiX 174 is one) have
double coding. To explain: each triplet of nucleotides (DNA building blocks;
there are 4 different ones) translates to one amino acid (protein building
blocks; of these there are 20 different ones). Now 4x4x4 gives 64 different
triplets, coding (with a lot of redundancy) for 20 differnt amino acids. This
is called the genetic code, and there are largely the same throughout all the
kingdoms (there are a few variations, but let's gloss over that) Now to
translate DNA into protein, it is critical that you now where the triplets
start; CGT TTG AAC CCC specifies a differnt piece of protein than 
    [C] GTT TGA ACC [CC], which is exactly the same DNA but the translation
was started shifted one nucleotide to the right. In most cases, this protein
is just nonsense, but some viruses actually make use of this: they use the
same stretch of DNA to specify different stretches of protein, basically
exploiting the redundancy in the genetic code. Very cool, and I've been told
that in the early days of computing and especially games programming, similar
tricks with the instructino sets of of early microprocessors were done.

I vaguely remember that the 'digital organisms' of Thomas Ray (in his
brilliant study "Evolution, Ecology and Optimization of Digital Organisms";
see http://www.hip.atr.co.jp/~ray/pubs/tierra/tierrahtml.html) actually
discovered the very same trick. If the Tierran critters can discover this
optimization technique, I see no reason why bacteria and viruses might not
discover it as well. Ah well,

                                                                      Philip

-- 
Enough is more than a lot.
-----------------------------------------------------------------------------
Philip Lijnzaad, lijnzaad at ebi.ac.uk | European Bioinformatics Institute,rm A2-24
+44 (0)1223 49 4639                 | Wellcome Trust Genome Campus, Hinxton
+44 (0)1223 49 4468 (fax)           | Cambridgeshire CB10 1SD,  GREAT BRITAIN
PGP fingerprint: E1 03 BF 80 94 61 B6 FC  50 3D 1F 64 40 75 FB 53




More information about the Python-list mailing list