[Tutor] Deleting strings from a line

Peter Otten __peter__ at web.de
Tue Apr 26 14:22:40 CEST 2011


Spyros Charonis wrote:

> Hello,
> 
> I've written a script that scans a biological database and extracts some
> information. A sample of output from my script is as follows:
> 
> LYLGILLSHAN                      AA3R_SHEEP    263    31
> 
>  LYMGILLSHAN                      AA3R_HUMAN    264    31
> 
>  MCLGILLSHAN                        AA3R_RAT    266    31
> 
>  LLVGILLSHAN                      AA3R_RABIT    265    31
> 
> The leftmost strings are the ones I want to keep, while I would like to
> get rid of the ones to the right (AA3R_SHEEP, 263 61) which are just
> indicators of where the sequence came from and genomic coordinates. Is
> there any way to do this with a string processing command? The loop which
> builds my list goes like this:
> 
>  for line in query_lines:
>             if line.startswith('fd;'):  # find motif sequences
>                 #print "Found an FD for your query!",
> line.rstrip().lstrip('fd;')
>                 print line.lstrip('fd;')
>                 motif.append(line.rstrip().lstrip('fd;'))
> 
> Is there a del command I can use to preserve only the actual sequences
> themselves. Many thanks in advance!

You don't have to delete; instead extract the piece you are interested in:

with open("prints41_1.kdat") as instream:
    for line in instream:
        if line.startswith("fd;"):
            print line.split()[1]

To see what the last line does, lets perform it in two steps

>>> line = 'fd; RVNIENPSRADSYNPRAG             A1YQH4_ORYSJ    310   310\n'
>>> parts = line.split()
>>> parts
['fd;', 'RVNIENPSRADSYNPRAG', 'A1YQH4_ORYSJ', '310', '310']
>>> wanted = parts[1]
>>> wanted
'RVNIENPSRADSYNPRAG'




More information about the Tutor mailing list