[Tutor] Script for Parsing string sequences from a file

Joel Goldstick joel.goldstick at gmail.com
Fri Apr 15 14:54:19 CEST 2011


On Fri, Apr 15, 2011 at 8:41 AM, Spyros Charonis <s.charonis at gmail.com>wrote:

> Hello,
>
> I'm doing a biomedical degree and am taking a course on bioinformatics. We
> were given a raw version of a public database in a file (the file is in
> simple ASCII) and need to extract only certain lines containing important
> information. I've made a script that does not work and I am having trouble
> understanding why.
>
> when I run it on the python shell, it prompts for a protein name but then
> reports that there is no such entry. The first while loop nested inside a
> for loop is intended to pick up all lines beginning with "gc;", chop off the
> "gc;" part and keep only the text after that (which is a protein name).
>  Then it scans the file and collects all lines, chops the "gc;" and stores
> in them in a tuple. This tuple is not built correctly, because as I posted
> when the program is run it reports that it cannot find my query in the tuple
> I created and it is certainly in the database. Can you detect what the
> mistake is? Thank you in advance!
>
> Spyros
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>
import os, string

printsdb = open('/users/spyros/folder1/python/PRINTSmotifs/prints41_1.kdat',
'r')
lines = printsdb.readlines()

# find PRINTS name entries
you need to have a list to collect your strings:
protnames = []
for line in lines:   # this gets you each line
    #while line.startswith('gc;'):  this is wrong
    if line.startswith('gc;');     # do this instead
        protnames.append(line.lstrip('gc;'))   # this adds your stripped
string to the protnames list

if not protnames:
            print('error in creating tuple') # check if tuple is true or
false
        #print(protnames)
        break

query = input("search a protein: ")
query = query.upper()
if query in protnames:
    print("\nDisplaying Motifs")
else:
    print("\nentry not in database")

# Parse motifs
def extract_motifs(query):
    motif_id = ()
    motif = ()
    while query in lines:  ####for query, get motif_ids and motifs
        while line.startswith('ft;'):
            motif_id = line.lstrip('ft;')
            motif_ids = (motif_id)
            #print(motif_id)
            while line.startswith('fd;'):
                motif = line.lstrip('fd;')
                motifs = (motif)
            #print(motif)
            return motif_id, motif

if __name__ == '__main__':
    final_motifs = extract_motifs('query')



-- 
Joel Goldstick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110415/d9e8149e/attachment.html>


More information about the Tutor mailing list