[Tutor] how to do systematic searching in dictionary and printing it

Thu Oct 20 18:32:02 CEST 2005

dear group, 

I have two files in a text format and look this way:

File a1.txt:
>a1
TTAATTGGAACA
>a2
AGGACAAGGATA
>a3
TTAAGGAACAAA

File b1.txt:
>b1
TTAATTGGAACA
>b2
AGGTCAAGGATA
>b3
AAGGCCAATTAA

I want to check if there are common elements based on
ATGC sequences. a1 and b1 are identical sequences and
I want to select them and print the headers (starting
with > symbol). 

a1 '\t' b1

Here:
>XXXXX is called header and the line followed by >line
is sequence. In bioinformatics, this is called a FASTA
format.  What I am doing here is, I am matching the
sequences (these are always 25 mers in this instance)
and if they match, I am asking python to write the
header +'\t'+ header

ak = a[1::2]
av = a[::2]
seq_dict = dict(zip(ak,av))

**************************************
>>>seq_dict
{'TTAAGGAACAAA': '>a3', 'AGGACAAGGATA': '>a2',
'TTAATTGGAACA': '>a1'}
**************************************

bv = b[1::2]  

***************************************
>>>bv
['TTAATTGGAACA', 'AGGTCAAGGATA', 'AAGGCCAATTAA']

>>>for i in bv:
	if seq_dict.has_key(i):
		print seq_dict[i]

>a1

***************************************

Here a1 is the only common element.

However, I am having difficulty printing that b1 is
identical to a1

how do i take b and do this search. It was easy for me
to take the sequence part by doing

b[1::2]. however, I want to print b1 header has same
sequence as a1

a1 +'\t'+b1

Is there anyway i can do this. This is very simple and
due to my brain block, I am unable to get it out. 
Can any one please help me out. 

Thanks

__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com