[Tutor] Simple string processing problem

Mark Thomas thomas.s.mark at gmail.com
Sat May 14 08:04:41 CEST 2005


On 13 May 2005 21:59:58 +0100, cgw501 at york.ac.uk <cgw501 at york.ac.uk> wrote:
> The file:
> 
> Scer            ACTAACAAGCAAAATGTTTTGTTTCTCCTTTT-AAAATAGTACTGCTGTTTCTCAAGCTG
> Spar            actaacaagcaaaatgttttgtttctcctttt-aaaatagtactgctgtttctcaagctg
> Smik            actaacaagcaaaatgtttcttttctcttttttgaaatagtactgctgcttctcaagctg
> Sbay            actaacaagcaaaaactttttgttttatt----gaaatagtactgctgtctctcaagctg
>                 ****  * ************** **   ********  ***   ***** *******  *
> 
> Scer            GGGGTGCTCACCAATTTATCCCAATTGGTTTCGGTATCAAGAAGTTGCAAATTAACTGTG
> Spar            GGGGTGCTCACCAATTTATCCCAATTGGTTTCGGTATCAAGAAGTTGCAAATTAACTGTG
> Smik            GGGGTGCTCACCAATTCATCCCAATTGGTTTCGGTATCAAGAAGTTGCAAATTAACTGTG
> Sbay            GGGGTGCTCACCAATTCATCCCAATTGGTTTCGGTATCAAGAAATTGCAAATTAACTGTG
>                 * ********** *********  **** *********   *  ** ***** ** ****
> 
> Scer            ACCACGTCCAATCTACCGATATTGCTGCTATGCAAAAATTATAAaaggctttttt-ataa
> Spar            ACCACGTCCAATCTACCGATATTGCTGCTATGCAAAAATTATAAaaagctttttttataa
> Smik            ACCACGTCCAATCTACCGATATTGCTGCTATGCAAAAATTATAAgaagctttttctataa
> Sbay            ACCACGTCCAATCTACCGATATTGCTGCTATGCAAAAATTATAAgaagctttttctataa
>                 ******************************************** * *******  ****
> 
> Scer            actttttataatt----aacattaa-------agcaaaaacaacattgtaaagattaaca
> Spar            actttttataata----aacatcaa-------agcaaaaacaacattgtaaagattaaca
> Smik            actttttataatt----aacatcgacaaaaacgacaacaacaacattgtaaagattaaca
> Sbay            actttttataacttagcaacaacaacaacaacaacatcaacaacattgtaaagattaaca
>                 ***********      ****   *         **  **********************

How about some RE action.

>>> import re
>>> pat = re.compile('^(S[a-z]{3}\s*[A-Z]+).*$')
>>> fr = file('dna','r')
>>> for line in fr:
...     m = pat.match(line)
...     if m:
...             print m.group(1)
...
Scer            ACTAACAAGCAAAATGTTTTGTTTCTCCTTTT
Scer            GGGGTGCTCACCAATTTATCCCAATTGGTTTCGGTATCAAGAAGTTGCAAATTAACTGTG
Spar            GGGGTGCTCACCAATTTATCCCAATTGGTTTCGGTATCAAGAAGTTGCAAATTAACTGTG
Smik            GGGGTGCTCACCAATTCATCCCAATTGGTTTCGGTATCAAGAAGTTGCAAATTAACTGTG
Sbay            GGGGTGCTCACCAATTCATCCCAATTGGTTTCGGTATCAAGAAATTGCAAATTAACTGTG
Scer            ACCACGTCCAATCTACCGATATTGCTGCTATGCAAAAATTATAA
Spar            ACCACGTCCAATCTACCGATATTGCTGCTATGCAAAAATTATAA
Smik            ACCACGTCCAATCTACCGATATTGCTGCTATGCAAAAATTATAA
Sbay            ACCACGTCCAATCTACCGATATTGCTGCTATGCAAAAATTATAA


-- 
 _
( ) Mark Thomas     ASCII ribbon campaign
 X www.theswamp.org   - against HTML email
/ \


More information about the Tutor mailing list