[Tutor] Simple Stats on Apache Logs
Christian Witts
cwitts at compuscan.co.za
Thu Feb 11 11:35:20 CET 2010
Lao Mao wrote:
> Hi,
>
> I have 3 servers which generate about 2G of webserver logfiles in a
> day. These are available on my machine over NFS.
>
> I would like to draw up some stats which shows, for a given keyword,
> how many times it appears in the logs, per hour, over the previous week.
>
> So the behavior might be:
>
> $ ./webstats --keyword downloader
>
> Which would read from the logs (which it has access to) and produce
> something like:
>
> Monday:
> 0000: 12
> 0100: 17
>
> etc
>
> I'm not sure how best to get started. My initial idea would be to
> filter the logs first, pulling out the lines with matching keywords,
> then check the timestamp - maybe incrementing a dictionary if the
> logfile was within a certain time?
>
> I'm not looking for people to write it for me, but I'd appreciate some
> guidance as the the approach and algorithm. Also what the simplest
> presentation model would be. Or even if it would make sense to stick
> it in a database! I'll post back my progress.
>
> Thanks,
>
> Laomao
> ------------------------------------------------------------------------
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
grep -c <keyword> <file-mask eg. *.log>
or if you are looking for only stuff for today for eg then
grep <date> | grep -c <keyword> <file-mask>
That would be the simplest implementation. For a python implementation
think about dictionaries with multiple layers like {Date: {Keyword1:
Count, Keyword2: Count}. Essentially you would just iterate over the
file, check if the line contains your keyword(s) that you are looking
for and then incrementing the counter for it.
--
Kind Regards,
Christian Witts
Business Intelligence
C o m p u s c a n | Confidence in Credit
Telephone: +27 21 888 6000
National Cell Centre: 0861 51 41 31
Fax: +27 21 413 2424
E-mail: cwitts at compuscan.co.za
NOTE: This e-mail (including attachments )is subject to the disclaimer published at: http://www.compuscan.co.za/live/content.php?Item_ID=494.
If you cannot access the disclaimer, request it from email.disclaimer at compuscan.co.za or 0861 514131.
National Credit Regulator Credit Bureau Registration No. NCRCB6
More information about the Tutor
mailing list