[Tutor] how to do systematic searching in dictionary and printing it
Srinivas Iyyer
srini_iyyer_bio at yahoo.com
Thu Oct 20 18:32:02 CEST 2005
dear group,
I have two files in a text format and look this way:
File a1.txt:
>a1
TTAATTGGAACA
>a2
AGGACAAGGATA
>a3
TTAAGGAACAAA
File b1.txt:
>b1
TTAATTGGAACA
>b2
AGGTCAAGGATA
>b3
AAGGCCAATTAA
I want to check if there are common elements based on
ATGC sequences. a1 and b1 are identical sequences and
I want to select them and print the headers (starting
with > symbol).
a1 '\t' b1
Here:
>XXXXX is called header and the line followed by >line
is sequence. In bioinformatics, this is called a FASTA
format. What I am doing here is, I am matching the
sequences (these are always 25 mers in this instance)
and if they match, I am asking python to write the
header +'\t'+ header
ak = a[1::2]
av = a[::2]
seq_dict = dict(zip(ak,av))
**************************************
>>>seq_dict
{'TTAAGGAACAAA': '>a3', 'AGGACAAGGATA': '>a2',
'TTAATTGGAACA': '>a1'}
**************************************
bv = b[1::2]
***************************************
>>>bv
['TTAATTGGAACA', 'AGGTCAAGGATA', 'AAGGCCAATTAA']
>>>for i in bv:
if seq_dict.has_key(i):
print seq_dict[i]
>a1
***************************************
Here a1 is the only common element.
However, I am having difficulty printing that b1 is
identical to a1
how do i take b and do this search. It was easy for me
to take the sequence part by doing
b[1::2]. however, I want to print b1 header has same
sequence as a1
a1 +'\t'+b1
Is there anyway i can do this. This is very simple and
due to my brain block, I am unable to get it out.
Can any one please help me out.
Thanks
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
More information about the Tutor
mailing list