Newbie tip: Prg speed from ~5 hrs to < 1 minute
Terry Byrne
TerryByrne1963 at yahoo.com
Mon Apr 29 10:18:24 EDT 2002
Fernando,
With pleasure. Here is some detail on the 'after', I hope not too
much! The before was simpler, just ran re.search on every line, had no
string.find test and no pass command...
A function does the logfile evaluation. The function generates an HTML
page describing the results and conditionally e-mails certain
managers, depending on the errors it finds. When calling the function,
one passes in the list of strings (the log file content, generated by
Python's file.readlines() method), a flag-string telling what type of
log file it is (so the function knows what error codes to look for),
and the filename of the log file.
def HTMLIfy(aryStrings, strType, strShortName):
# Count important notifications
# ...
# Code skipped for brevity
# ...
for aLine in rfContent:
#re expensive. apply conditionally
if aType=='val':
idx = -1
idx = aLine.find('FrNfo:2401 ')
if idx > -1:
# The error number lets me know a record number follows in
parens
# Generate HTML link to that record number.
print '\n\tFound 2401'
mo = re.search('FrNfo:2401 \(Record# ([0-9]+)\)', aLine,
re.IGNORECASE)
# Next line shortened for brevity
aLine=aLine[0:mo.start()-1]+'FrNFO:2401 (<a href="'+ ...
+mo.group(1)+'</a> '+aLine[mo.end()+1:]
aLine += '<br>'
rfHandle.write(aLine)
aLine = None
idx = -1
pass
idx = aLine.find('Error Validator:2420')
if idx > -1:
LINKS_VNOQUERYHITS += 1
aLine += '<br>'
rfHandle.write(aLine)
aLine = None
idx = -1
pass
idx = aLine.find('Error Validator:2412')
if idx > -1:
LINKS_VNOJUMPDEST += 1
aLine += '<br>'
rfHandle.write(aLine)
aLine = None
idx = -1
pass
At first I was just running the re.search on each line. Time: ~ 5 hrs
for 60K line log file. Started testing with the string.find method to
make sure a certain error code appeared on the line before running the
"expensive" re.search, and that cut execution time from ~5 hrs to ~
1.5 hrs, good but "no cigar," as they say.
Then I realized that each line of content could be treated the same as
you'd treat a switch() structure in C, or a case structure in Pascal:
if one option is a "hit", then you needn't even check for all the
other options. So I started pass-ing to the next line of the content
whenever I got a "hit." That cut my time down to under a minute for a
60K-line log file. Like any good idea, it's truly simple!
Terry
More information about the Python-list
mailing list