[Tutor] Selecting text

kumar s ps_python at yahoo.com
Wed Jan 19 05:12:32 CET 2005


Dear group:

I have two lists:

1. Lseq:

>>> len(Lseq)
30673
>>> Lseq[20:25]
['NM_025164', 'NM_025164', 'NM_012384', 'NM_006380',
'NM_007032','NM_014332']


2. refseq:
>>> len(refseq)
1080945
>>> refseq[0:25]
['>gi|10047089|ref|NM_014332.1| Homo sapiens small
muscle protein, X-linked (SMPX), mRNA',
'GTTCTCAATACCGGGAGAGGCACAGAGCTATTTCAGCCACATGAAAAGCATCGGAATTGAGATCGCAGCT',
'CAGAGGACACCGGGCGCCCCTTCCACCTTCCAAGGAGCTTTGTATTCTTGCATCTGGCTGCCTGGGACTT',
'CCCTTAGGCAGTAAACAAATACATAAAGCAGGGATAAGACTGCATGAATATGTCGAAACAGCCAGTTTCC',
'AATGTTAGAGCCATCCAGGCAAATATCAATATTCCAATGGGAGCCTTTCGGCCAGGAGCAGGTCAACCCC',
'CCAGAAGAAAAGAATGTACTCCTGAAGTGGAGGAGGGTGTTCCTCCCACCTCGGATGAGGAGAAGAAGCC',
'AATTCCAGGAGCGAAGAAACTTCCAGGACCTGCAGTCAATCTATCGGAAATCCAGAATATTAAAAGTGAA',
'CTAAAATATGTCCCCAAAGCTGAACAGTAGTAGGAAGAAAAAAGGATTGATGTGAAGAAATAAAGAGGCA',
'GAAGATGGATTCAATAGCTCACTAAAATTTTATATATTTGTATGATGATTGTGAACCTCCTGAATGCCTG',
'AGACTCTAGCAGAAATGGCCTGTTTGTACATTTATATCTCTTCCTTCTAGTTGGCTGTATTTCTTACTTT',
'ATCTTCATTTTTGGCACCTCACAGAACAAATTAGCCCATAAATTCAACACCTGGAGGGTGTGGTTTTGAG',
'GAGGGATATGATTTTATGGAGAATGATATGGCAATGTGCCTAACGATTTTGATGAAAAGTTTCCCAAGCT',
'ACTTCCTACAGTATTTTGGTCAATATTTGGAATGCGTTTTAGTTCTTCACCTTTTAAATTATGTCACTAA',
'ACTTTGTATGAGTTCAAATAAATATTTGACTAAATGTAAAATGTGA',
'>gi|10047091|ref|NM_013259.1| Homo sapiens neuronal
protein (NP25), mRNA',
'TGTGCTGCTATTGTGTGGATGCCGCGCGTGTCTTCTCTTCTTTCCAGAGATGGCTAACAGGGGCCCGAGC',
'TATGGCTTAAGCCGAGAGGTGCAGGAGAAGATCGAGCAGAAGTATGATGCGGACCTGGAGAACAAGCTGG',
'TGGACTGGATCATCCTGCAGTGCGCCGAGGACATAGAGCACCCGCCCCCCGGCAGGGCCCATTTTCAGAA',
'ATGGTTAATGGACGGGACGGTCCTGTGCAAGCTGATAAATAGTTTATACCCACCAGGACAAGAGCCCATA',
'CCCAAGATCTCAGAGTCAAAGATGGCTTTTAAGCAGATGGAGCAAATCTCCCAGTTCCTAAAAGCTGCGG',
'AGACCTATGGTGTCAGAACCACCGACATCTTTCAGACGGTGGATCTATGGGAAGGGAAGGACATGGCAGC',
'TGTGCAGAGGACCCTGATGGCTTTAGGCAGCGTTGCAGTCACCAAGGATGATGGCTGCTATCGGGGAGAG',
'CCATCCTGGTTTCACAGGAAAGCCCAGCAGAATCGGAGAGGCTTTTCCGAGGAGCAGCTTCGCCAGGGAC',
'AGAACGTAATAGGCCTGCAGATGGGCAGCAACAAGGGAGCCTCCCAGGCGGGCATGACAGGGTACGGGAT',
'GCCCAGGCAGATCATGTTAGGACGCGGCATCCTGCCCCTGGTAGAGAGGACGAATGTTCCACACCATGGT']


If Lseq[i] is present in refseq[k], then I am
interested in printing starting from refseq[k] until
the element that starts with '>' sign. 

my Lseq has NM_014332 element and this is also present
in second list refseq. I want to print starting from
element where NM_014332 is present until next element
that starts with '>' sign.

In this case, it would be:
'>gi|10047089|ref|NM_014332.1| Homo sapiens small
muscle protein, X-linked (SMPX), mRNA',
'GTTCTCAATACCGGGAGAGGCACAGAGCTATTTCAGCCACATGAAAAGCATCGGAATTGAGATCGCAGCT',
'CAGAGGACACCGGGCGCCCCTTCCACCTTCCAAGGAGCTTTGTATTCTTGCATCTGGCTGCCTGGGACTT',
'CCCTTAGGCAGTAAACAAATACATAAAGCAGGGATAAGACTGCATGAATATGTCGAAACAGCCAGTTTCC',
'AATGTTAGAGCCATCCAGGCAAATATCAATATTCCAATGGGAGCCTTTCGGCCAGGAGCAGGTCAACCCC',
'CCAGAAGAAAAGAATGTACTCCTGAAGTGGAGGAGGGTGTTCCTCCCACCTCGGATGAGGAGAAGAAGCC',
'AATTCCAGGAGCGAAGAAACTTCCAGGACCTGCAGTCAATCTATCGGAAATCCAGAATATTAAAAGTGAA',
'CTAAAATATGTCCCCAAAGCTGAACAGTAGTAGGAAGAAAAAAGGATTGATGTGAAGAAATAAAGAGGCA',
'GAAGATGGATTCAATAGCTCACTAAAATTTTATATATTTGTATGATGATTGTGAACCTCCTGAATGCCTG',
'AGACTCTAGCAGAAATGGCCTGTTTGTACATTTATATCTCTTCCTTCTAGTTGGCTGTATTTCTTACTTT',
'ATCTTCATTTTTGGCACCTCACAGAACAAATTAGCCCATAAATTCAACACCTGGAGGGTGTGGTTTTGAG',
'GAGGGATATGATTTTATGGAGAATGATATGGCAATGTGCCTAACGATTTTGATGAAAAGTTTCCCAAGCT',
'ACTTCCTACAGTATTTTGGTCAATATTTGGAATGCGTTTTAGTTCTTCACCTTTTAAATTATGTCACTAA',
'ACTTTGTATGAGTTCAAATAAATATTTGACTAAATGTAAAATGTGA'

I could not think of any smart way to do this,
although I have tried like this:

>>> for ele1 in Lseq:
	for ele2 in refseq:
		if ele1 in ele2:
			k = ele2
			s = refseq[ele2].startswith('>')
			print k,s

			

Traceback (most recent call last):
  File "<pyshell#261>", line 5, in -toplevel-
    s = refseq[ele2].startswith('>')
TypeError: list indices must be integers


I do not know how to dictate to python to select lines
between two > symbols. 

Could any one help me thanks. 

K


		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - 250MB free storage. Do more. Manage less. 
http://info.mail.yahoo.com/mail_250


More information about the Tutor mailing list