[Tutor] Simple Stats on Apache Logs

Christian Witts cwitts at compuscan.co.za
Thu Feb 11 11:35:20 CET 2010


Lao Mao wrote:
> Hi,
>
> I have 3 servers which generate about 2G of webserver logfiles in a 
> day.  These are available on my machine over NFS.
>
> I would like to draw up some stats which shows, for a given keyword, 
> how many times it appears in the logs, per hour, over the previous week.
>
> So the behavior might be:
>
> $ ./webstats --keyword downloader
>
> Which would read from the logs (which it has access to) and produce 
> something like:
>
> Monday:
> 0000: 12
> 0100: 17
>
> etc
>
> I'm not sure how best to get started.  My initial idea would be to 
> filter the logs first, pulling out the lines with matching keywords, 
> then check the timestamp - maybe incrementing a dictionary if the 
> logfile was within a certain time?
>
> I'm not looking for people to write it for me, but I'd appreciate some 
> guidance as the the approach and algorithm.  Also what the simplest 
> presentation model would be.  Or even if it would make sense to stick 
> it in a database!  I'll post back my progress.
>
> Thanks,
>
> Laomao
> ------------------------------------------------------------------------
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>   
grep -c <keyword> <file-mask eg. *.log>
or if you are looking for only stuff for today for eg then
grep <date> | grep -c <keyword> <file-mask>

That would be the simplest implementation.  For a python implementation 
think about dictionaries with multiple layers like {Date: {Keyword1: 
Count, Keyword2: Count}.  Essentially you would just iterate over the 
file, check if the line contains your keyword(s) that you are looking 
for and then incrementing the counter for it.

-- 
Kind Regards,
Christian Witts
Business Intelligence

C o m p u s c a n | Confidence in Credit

Telephone: +27 21 888 6000
National Cell Centre: 0861 51 41 31
Fax: +27 21 413 2424
E-mail: cwitts at compuscan.co.za

NOTE:  This e-mail (including attachments )is subject to the disclaimer published at: http://www.compuscan.co.za/live/content.php?Item_ID=494.
If you cannot access the disclaimer, request it from email.disclaimer at compuscan.co.za or 0861 514131.

National Credit Regulator Credit Bureau Registration No. NCRCB6 




More information about the Tutor mailing list