Parsing apache log files

Jim Richardson warlock at eskimo.com
Fri Feb 20 04:58:57 EST 2004


On Fri, 20 Feb 2004 09:18:01 GMT,
 Paul McGuire <ptmcg at users.sourceforge.net> wrote:
> "Jim Richardson" <warlock at eskimo.com> wrote in message
> news:oqigg1-3nl.ln1 at grendel.myth...
>>
>> I am pulling apart some big apache logs (800-1000MB) for some analysis,
>> and stuffing it into a MySQL database. Most of it goes ok, despite my
>> meager coding abilities. But every so often I run across "borken" bits
>> of data, like user agent strings that include "'/\ and such, although
>> they are escaped by apache in writing the log, they break up my somewhat
>> clunky splits.
>>
> pyparsing examples directory includes an HTTP server log parser.  Using your
> data, there was one minor error where the bytesSent field in the first line
> was just a dash instead of an integer.  After correcting that, I ran it
> against your test lines and got this output:
>
> fields.numBytesSent = -
> fields.timestamp = ['16/Feb/2004:04:09:49', '-0800']
> fields.clientSfw = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
> fields.referrer =
> http://www.foobarp.org/theme_detail.php?type=vs&cat=0&mid=27512
> fields.cmd = ['GET', '/ads/redirectads/336x280redirect.htm', 'HTTP/1.1']
> fields.ipAddr = 111.111.111.11
> fields.statusCode = 304
>
> fields.numBytesSent = 541
> fields.timestamp = ['16/Feb/2004:10:35:12', '-0800']
> fields.clientSfw = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera
> 7.20  [ru
> fields.referrer = http://11.11.111.11/adframe.php?n=ad1f311a&what=zone:56
> fields.cmd = ['GET', '/ads/redirectads/468x60redirect.htm', 'HTTP/1.1']
> fields.ipAddr = 11.111.11.111
> fields.statusCode = 200
>
> Download pyparsing at http://pyparsing.sourceforge.net.
>
> Here's the change you'll have to make to the example:
>
> Change:
>                        integer.setResultsName("statusCode") +
>                        integer.setResultsName("numBytesSent")  +
> to:
>                        (integer | "-").setResultsName("statusCode") +
>                        (integer | "-").setResultsName("numBytesSent")  +
>
> -- Paul
>
>

now *this* looks interesting. Thanks a lot!

-- 
Jim Richardson     http://www.eskimo.com/~warlock
" ... a language is just an dialect with an army and a navy."
                                -- Paul Tomblin, in a.s.r.



More information about the Python-list mailing list