Newbie tip: Prg speed from ~5 hrs to < 1 minute

Terry Byrne TerryByrne1963 at yahoo.com
Sat Apr 27 13:01:21 EDT 2002


All,

In a recent project I needed to read in a text file (a log file
generated by a utility program) and test each line of the file,
looking for certain error codes. I used file.readlines() to load the
log file into a list, then I used for aLine in aList: to test each
line.

One of my log files was ~60,000 lines long. At first I was using
re.search to find the relevant error codes. But it took ~5 hours to
process this log file -- too slow. Knowing that the string module is
much faster than re, I switched to string.find(), and only if that
found a certain error code would I run the re operation. (re is needed
to obtain more specific error information.) That cut my time down to
~1 hr and 45 minutes, a big improvement but still too slow.

Then I used the keyword "pass" whenever I ran an re since each line
can contain only one error; after running an re it's not necessary to
look for other error codes at all. Now that 60,000-line log file takes
less than a minute for Python to handle. Good enough.

I had no idea just how powerful and fast Python's re engine can be.
Having come from a Perl background, Python's way of doing regexes
annoyed me at first. But no more.

Terry



More information about the Python-list mailing list