Parsing apache log files

Paul McGuire ptmcg at users.sourceforge.net
Fri Feb 20 04:18:01 EST 2004


"Jim Richardson" <warlock at eskimo.com> wrote in message
news:oqigg1-3nl.ln1 at grendel.myth...
>
> I am pulling apart some big apache logs (800-1000MB) for some analysis,
> and stuffing it into a MySQL database. Most of it goes ok, despite my
> meager coding abilities. But every so often I run across "borken" bits
> of data, like user agent strings that include "'/\ and such, although
> they are escaped by apache in writing the log, they break up my somewhat
> clunky splits.
>
pyparsing examples directory includes an HTTP server log parser.  Using your
data, there was one minor error where the bytesSent field in the first line
was just a dash instead of an integer.  After correcting that, I ran it
against your test lines and got this output:

fields.numBytesSent = -
fields.timestamp = ['16/Feb/2004:04:09:49', '-0800']
fields.clientSfw = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
fields.referrer =
http://www.foobarp.org/theme_detail.php?type=vs&cat=0&mid=27512
fields.cmd = ['GET', '/ads/redirectads/336x280redirect.htm', 'HTTP/1.1']
fields.ipAddr = 111.111.111.11
fields.statusCode = 304

fields.numBytesSent = 541
fields.timestamp = ['16/Feb/2004:10:35:12', '-0800']
fields.clientSfw = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera
7.20  [ru
fields.referrer = http://11.11.111.11/adframe.php?n=ad1f311a&what=zone:56
fields.cmd = ['GET', '/ads/redirectads/468x60redirect.htm', 'HTTP/1.1']
fields.ipAddr = 11.111.11.111
fields.statusCode = 200

Download pyparsing at http://pyparsing.sourceforge.net.

Here's the change you'll have to make to the example:

Change:
                       integer.setResultsName("statusCode") +
                       integer.setResultsName("numBytesSent")  +
to:
                       (integer | "-").setResultsName("statusCode") +
                       (integer | "-").setResultsName("numBytesSent")  +

-- Paul





More information about the Python-list mailing list