[Tutor] Logfile Manipulation

Stephen Nelson-Smith sanelson at gmail.com
Mon Nov 9 09:58:29 CET 2009


On Mon, Nov 9, 2009 at 8:47 AM, Alan Gauld <alan.gauld at btinternet.com> wrote:

> I'm not familiar with Apache log files so I'll let somebody else answer,
> but I suspect you can either use string.split() or a re.findall(). You might
> even be able to use csv. Or if they are in XML you could use ElementTree.
> It all depends on the data!

An apache logfile entry looks like this:

89.151.119.196 - - [04/Nov/2009:04:02:10 +0000] "GET
/service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
HTTP/1.1" 200 50 "-" "-"

I want to extract 24 hrs of data based timestamps like this:

[04/Nov/2009:04:02:10 +0000]

I also need to do some filtering (eg I actually don't want anything
with service.php), and I also have to do some substitutions - that's
trivial other than not knowing the optimum place to do it?  IE should
I do multiple passes?  Or should I try to do all the work at once,
only viewing each line once?  Also what about reading from compressed
files?  The data comes in as 6 gzipped logfiles which expand to 6G in
total.

S.


More information about the Tutor mailing list