[Tutor] Fastest way to iterate through a file

Tue Jun 26 17:03:49 CEST 2007

On Tue, Jun 26, 2007 at 10:04:07AM -0400, Kent Johnson wrote:
> Robert Hicks wrote:
> > This is the loop code:
> > 
> > for line in f2:
> >      for id in idList:
> >          if id in line:
> >              print "%s: %s" % (id, f2.next())
> >              found = "%s: %s" % (id, f2.next())
> >              f3.write(found)
> > 
> > 
> > I have an list, idList[], that contains a list of id numbers. That code 
> > will loop the the f2 file and for lines that have an id on it it will 
> > print the "next" line (so I can see what it is doing) and write it to a 
> > file. I will turn off that screen print after I get it going the way I 
> > want it to.
> 
> I don't see any particular reason this should be slow unless idList is 
> large. Perhaps the output is being buffered somewhere and not appearing 
> until the process is done? How are you running the program?
> 

I think Kent is right.  You probably have a solution that is good
enough.  Ask yourself whether it really is going to save any time
if you are able to optimize it.

But, if speed is important, then you might try a solution that uses
a regular expression in place of

     for id in idList:
         if id in line:

A regular expression of the form "(id1|id2|id2)", which you could
construct programmatically, might work for your problem:

    import re
    s1 = '|'.join(idList)
    s2 = '(%s)' % s1
    pat = re.compile(s2)

Then use something like:

    for mo in pat.finditer(line):

Dave

-- 
Dave Kuhlman
http://www.rexx.com/~dkuhlman