recombination variations

Peter Otten __peter__ at web.de
Wed Dec 1 15:57:56 EST 2004


David Siedband wrote:

> The problem I'm solving is to take a sequence like 'ATSGS' and make all
> the DNA sequences it represents.  The A, T, and G are fine but the S
> represents C or G.  I want to take this input:
> 
> [ [ 'A' ] , [ 'T' ] , [ 'C' , 'G' ], [ 'G' ] , [ 'C' , 'G' ] ]
> 
> and make the list:
> 
> [ 'ATCGC' , 'ATCGG' , 'ATGGC' , 'ATGGG' ]

[...]

The code you provide only addresses the first part of your problem, and so
does mine:

>>> def disambiguate(seq, alphabet):
...     return [list(alphabet.get(c, c)) for c in seq]
...
>>> alphabet = {
...     "W": "AT",
...     "S": "CG"
... }
>>> disambiguate("ATSGS", alphabet)
[['A'], ['T'], ['C', 'G'], ['G'], ['C', 'G']]

Note that "identity entries" (e. g. mapping "A" to "A") in the alphabet
dictionary are no longer necessary. The list() call in disambiguate() is
most likely superfluous, but I put it in to meet your spec accurately.

Now on to the next step :-)

Peter




More information about the Python-list mailing list