Newbie tip: Prg speed from ~5 hrs to < 1 minute

Terry Byrne TerryByrne1963 at yahoo.com
Mon Apr 29 10:18:24 EDT 2002


Fernando,

With pleasure. Here is some detail on the 'after', I hope not too
much! The before was simpler, just ran re.search on every line, had no
string.find test and no pass command...

A function does the logfile evaluation. The function generates an HTML
page describing the results and conditionally e-mails certain
managers, depending on the errors it finds. When calling the function,
one passes in the list of strings (the log file content, generated by
Python's file.readlines() method), a flag-string telling what type of
log file it is (so the function knows what error codes to look for),
and the filename of the log file.

def HTMLIfy(aryStrings, strType, strShortName):
  # Count important notifications
  #   ...
  # Code skipped for brevity
  #   ...
  for aLine in rfContent:
    #re expensive. apply conditionally
    if aType=='val':
      idx = -1
      idx = aLine.find('FrNfo:2401 ')
      if idx > -1:
        # The error number lets me know a record number follows in
parens
        # Generate HTML link to that record number.
        print '\n\tFound 2401'
        mo = re.search('FrNfo:2401 \(Record# ([0-9]+)\)', aLine,
re.IGNORECASE)
        # Next line shortened for brevity
        aLine=aLine[0:mo.start()-1]+'FrNFO:2401 (<a href="'+ ...
+mo.group(1)+'</a> '+aLine[mo.end()+1:]
        aLine += '<br>'
        rfHandle.write(aLine)
        aLine = None
        idx = -1
        pass
      idx = aLine.find('Error Validator:2420')
      if idx > -1:
        LINKS_VNOQUERYHITS += 1
        aLine += '<br>'
        rfHandle.write(aLine)
        aLine = None
        idx = -1
        pass
      idx = aLine.find('Error Validator:2412')
      if idx > -1:
        LINKS_VNOJUMPDEST += 1
        aLine += '<br>'
        rfHandle.write(aLine)
        aLine = None
        idx = -1
        pass

At first I was just running the re.search on each line. Time: ~ 5 hrs
for 60K line log file. Started testing with the string.find method to
make sure a certain error code appeared on the line before running the
"expensive" re.search, and that cut execution time from ~5 hrs to ~
1.5 hrs, good but "no cigar," as they say.

Then I realized that each line of content could be treated the same as
you'd treat a switch() structure in C, or a case structure in Pascal:
if one option is a "hit", then you needn't even check for all the
other options. So I started pass-ing to the next line of the content
whenever I got a "hit." That cut my time down to under a minute for a
60K-line log file. Like any good idea, it's truly simple!

Terry



More information about the Python-list mailing list