[Tutor] next element in list

Wed Feb 26 13:29:04 CET 2014

rahmad akbar wrote:

> hey guys
> 
> i have this file i wish to parse, the file looks something like bellow.
> there are only four entry here (AaaI, AacLI, AaeI, AagI). the complete
> file contains thousands of entries
> 
>     =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>     REBASE, The Restriction Enzyme Database   http://rebase.neb.com
>     Copyright (c)  Dr. Richard J. Roberts, 2014.   All rights reserved.
>     =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> 
> Rich Roberts                                                    Jan 30
> 2014
> 
> AaaI (XmaIII)                     C^GGCCG
> AacLI (BamHI)                     GGATCC
> AaeI (BamHI)                      GGATCC
> AagI (ClaI)                       AT^CGAT
> 
> 
> the strategy was to mark the string 'Rich Roberts' as the start. i wrote
> the following function. but then i realized i couldn't do something like
> .next() to the var in_file which is a list. so i added a flag start =
> False in which will be turned to True upon 'Rich Roberts' found. is the
> any simpler way to move to the next element in the list. like built in
> method or something like that.
> 
> def read_bionet(bionetfile):
>   res_enzime_dict = {}
>   in_file = open(bionetfile, 'r').readlines()
>   start = False
>   for line in in_file:
>     if line.startswith('Rich Roberts'):
>       start = True
>     if start and len(line) >= 10:
>         line = line.split()
>         res_enzime_dict[line[0]] = line[-1]
>   return res_enzime_dict

As David says, don't call readlines() which reads the lines of the file into 
a list, iterate over the file directly:

def read_bionet(bionetfile):
    with open(bionetfile) as in_file:
        # skip header
        for line in in_file:
            if line.startswith("Rich Roberts"):
                break

        # populate dict
        res_enzimes = {}
        for line in in_file: # continues after the line with R. R.
            if len(line) >= 10:
                parts = line.split()
                res_enzimes[parts[0]] = parts[-1]

        # file will be closed now rather than at 
        # the garbage collector's discretion

    return res_enzimes