Parsing apache log files

Jim Richardson warlock at eskimo.com
Fri Feb 20 04:54:37 EST 2004


On Thu, 19 Feb 2004 22:32:24 -0800,
 Josiah Carlson <jcarlson at nospam.uci.edu> wrote:
>> In the meantime, is there some obvious method, or module that I have
>> missed ? 
>
> I use a regular expression:
> import re
> rexp = re.compile('(\d+\.\d+\.\d+\.\d+) - - \[([^\[\]:]+):'
>                    '(\d+:\d+:\d+) -(\d\d\d\d\)] ("[^"]*") '
>                    '(\d+) (-|\d+) ("[^"]*") (".*")\s*\Z')
>
> a = rexp.match(line)
> if not a is None:
>      a.group(1) #IP address
>      a.group(2) #day/month/year
>      a.group(3) #time of day
>      a.group(4) #timezone
>      a.group(5) #request
>      a.group(6) #code 200 for success, 404 for not found, etc.
>      a.group(7) #bytes transferred
>      a.group(8) #referrer
>      a.group(9) #browser
> else:
>      #this line did not match.
>
> That should work for most any line you get, but you may want to run it 
> over a few megs of your logs just to check and see if that else 
> statement is ever caught for a non-empty line.
>
>   - Josiah


thanks, although reading that re makes my brain hurt! :), and I don't
think it handles the case where the dashes are something else (the dash
is a place holder for some data that wasn't there on this request,
bytelength, referrer, something) but I'll look into it, thanks for the
example. 

-- 
Jim Richardson     http://www.eskimo.com/~warlock
Ok, the guy who made the netfilter Makefile was probably on some really
interesting and probably highly illegal drugs when he wrote it.
	-- Linus Torvalds 



More information about the Python-list mailing list