[Tutor] how to do systematic searching in dictionary and printing it

Srinivas Iyyer srini_iyyer_bio at yahoo.com
Thu Oct 20 18:32:02 CEST 2005


dear group, 


I have two files in a text format and look this way:


File a1.txt:
>a1
TTAATTGGAACA
>a2
AGGACAAGGATA
>a3
TTAAGGAACAAA



File b1.txt:
>b1
TTAATTGGAACA
>b2
AGGTCAAGGATA
>b3
AAGGCCAATTAA


I want to check if there are common elements based on
ATGC sequences. a1 and b1 are identical sequences and
I want to select them and print the headers (starting
with > symbol). 

a1 '\t' b1



Here:
>XXXXX is called header and the line followed by >line
is sequence. In bioinformatics, this is called a FASTA
format.  What I am doing here is, I am matching the
sequences (these are always 25 mers in this instance)
and if they match, I am asking python to write the
header +'\t'+ header


ak = a[1::2]
av = a[::2]
seq_dict = dict(zip(ak,av))

**************************************
>>>seq_dict
{'TTAAGGAACAAA': '>a3', 'AGGACAAGGATA': '>a2',
'TTAATTGGAACA': '>a1'}
**************************************



bv = b[1::2]  

***************************************
>>>bv
['TTAATTGGAACA', 'AGGTCAAGGATA', 'AAGGCCAATTAA']


>>>for i in bv:
	if seq_dict.has_key(i):
		print seq_dict[i]

		
>a1

***************************************

Here a1 is the only common element.

However, I am having difficulty printing that b1 is
identical to a1


how do i take b and do this search. It was easy for me
to take the sequence part by doing

b[1::2]. however, I want to print b1 header has same
sequence as a1

a1 +'\t'+b1

Is there anyway i can do this. This is very simple and
due to my brain block, I am unable to get it out. 
Can any one please help me out. 

Thanks



	
		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com


More information about the Tutor mailing list