[Tutor] Deleting strings from a line
Peter Otten
__peter__ at web.de
Tue Apr 26 14:22:40 CEST 2011
Spyros Charonis wrote:
> Hello,
>
> I've written a script that scans a biological database and extracts some
> information. A sample of output from my script is as follows:
>
> LYLGILLSHAN AA3R_SHEEP 263 31
>
> LYMGILLSHAN AA3R_HUMAN 264 31
>
> MCLGILLSHAN AA3R_RAT 266 31
>
> LLVGILLSHAN AA3R_RABIT 265 31
>
> The leftmost strings are the ones I want to keep, while I would like to
> get rid of the ones to the right (AA3R_SHEEP, 263 61) which are just
> indicators of where the sequence came from and genomic coordinates. Is
> there any way to do this with a string processing command? The loop which
> builds my list goes like this:
>
> for line in query_lines:
> if line.startswith('fd;'): # find motif sequences
> #print "Found an FD for your query!",
> line.rstrip().lstrip('fd;')
> print line.lstrip('fd;')
> motif.append(line.rstrip().lstrip('fd;'))
>
> Is there a del command I can use to preserve only the actual sequences
> themselves. Many thanks in advance!
You don't have to delete; instead extract the piece you are interested in:
with open("prints41_1.kdat") as instream:
for line in instream:
if line.startswith("fd;"):
print line.split()[1]
To see what the last line does, lets perform it in two steps
>>> line = 'fd; RVNIENPSRADSYNPRAG A1YQH4_ORYSJ 310 310\n'
>>> parts = line.split()
>>> parts
['fd;', 'RVNIENPSRADSYNPRAG', 'A1YQH4_ORYSJ', '310', '310']
>>> wanted = parts[1]
>>> wanted
'RVNIENPSRADSYNPRAG'
More information about the Tutor
mailing list