Scratchy 0.6 - An Apache access_log file parser and HTML report generator

Phil stuff@nonpoint.mailzilla.net
4 Jul 2003 06:18:01 -0700


http://scratchy.sourceforge.net

Changes since 0.5:
* Added country name lookup support
* Added COUNTRY_CACHE config file option (please read the important
note).
* Code cleanup. Created a FileBase class that file_tracker, dns_cache
all derive from. and country_cache
* template files are now configurable
* Each chart can be disabled (default is enabled)
* Added accessed files table
* Added CHARSET_ENCODING to config.
* Added additional file types (swf, ra, pdf, php)
* Addressed issue with ylabel_density in chart output
* Added additional browsers, robots and search engines
* Added some additional search engines
* Added additional robots
* Fixed daily chart display if days > 25
* Fixed report since it was ignoring preferences for search
phrases/keywords
* Erroneously reporting "Unix" as operating system for all X11
derivatives.
* Fixed bug in creating summary when gdchart was not installed.
* Added config file option for CHART_HEIGHT
* Catches chart draw() exceptions
* makedirs no longer produces information if dirs already exist
* Added /usr/bin/env python to parse.py and report.py
* File types are truncated to atmost 10 chars.

About Scratchy
Scratchy is a set of scripts to parse Apache web server log files and
extract useful information. From this data, Scratchy will create HTML
reports so that website administrators can easily view the information
and determine trends and their typical audience.

Scratchy began as a proof-of-concept which allowed me to compile stats
about my personal website. As time progressed I continually added
features and improvements and I felt that it was now at a point that
it would be useful to others.

Why Scratchy?
Well, the name of the project of course comes from the Simpsons "Itchy
and Scratchy Show". The functionality that the project aims to supply
is a complete log parsing and report generating tool. Also, there
seemed to be a need for such a project in Python. I have seen some
other Apache log parsers but they were developed in other languages
(such as Perl, C, etc). One goal of this project is for it to be
extensible, to that tune, most of the report appearance can be easily
modified by tweaking a single config file.

What information does Scratchy report?
* Accessed web pages
* hosts accessing your website
* operating systems
* browsers
* search engines
* robots/spiders
* file types accessed
* errors
* countries
* a trace of pages accessed by each ip address (if enabled).
* charts are produced for many of the tables (if enabled)
http://scratchy.sourceforge.net