Parsing apache log files
Josiah Carlson
jcarlson at nospam.uci.edu
Fri Feb 20 01:32:24 EST 2004
> In the meantime, is there some obvious method, or module that I have
> missed ?
I use a regular expression:
import re
rexp = re.compile('(\d+\.\d+\.\d+\.\d+) - - \[([^\[\]:]+):'
'(\d+:\d+:\d+) -(\d\d\d\d\)] ("[^"]*") '
'(\d+) (-|\d+) ("[^"]*") (".*")\s*\Z')
a = rexp.match(line)
if not a is None:
a.group(1) #IP address
a.group(2) #day/month/year
a.group(3) #time of day
a.group(4) #timezone
a.group(5) #request
a.group(6) #code 200 for success, 404 for not found, etc.
a.group(7) #bytes transferred
a.group(8) #referrer
a.group(9) #browser
else:
#this line did not match.
That should work for most any line you get, but you may want to run it
over a few megs of your logs just to check and see if that else
statement is ever caught for a non-empty line.
- Josiah
More information about the Python-list
mailing list