Parsing apache log files

Josiah Carlson jcarlson at nospam.uci.edu
Fri Feb 20 01:32:24 EST 2004


> In the meantime, is there some obvious method, or module that I have
> missed ? 

I use a regular expression:
import re
rexp = re.compile('(\d+\.\d+\.\d+\.\d+) - - \[([^\[\]:]+):'
                   '(\d+:\d+:\d+) -(\d\d\d\d\)] ("[^"]*") '
                   '(\d+) (-|\d+) ("[^"]*") (".*")\s*\Z')

a = rexp.match(line)
if not a is None:
     a.group(1) #IP address
     a.group(2) #day/month/year
     a.group(3) #time of day
     a.group(4) #timezone
     a.group(5) #request
     a.group(6) #code 200 for success, 404 for not found, etc.
     a.group(7) #bytes transferred
     a.group(8) #referrer
     a.group(9) #browser
else:
     #this line did not match.

That should work for most any line you get, but you may want to run it 
over a few megs of your logs just to check and see if that else 
statement is ever caught for a non-empty line.

  - Josiah



More information about the Python-list mailing list