[Tutor] Logfile Manipulation
Stephen Nelson-Smith
sanelson at gmail.com
Mon Nov 9 09:58:29 CET 2009
On Mon, Nov 9, 2009 at 8:47 AM, Alan Gauld <alan.gauld at btinternet.com> wrote:
> I'm not familiar with Apache log files so I'll let somebody else answer,
> but I suspect you can either use string.split() or a re.findall(). You might
> even be able to use csv. Or if they are in XML you could use ElementTree.
> It all depends on the data!
An apache logfile entry looks like this:
89.151.119.196 - - [04/Nov/2009:04:02:10 +0000] "GET
/service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
HTTP/1.1" 200 50 "-" "-"
I want to extract 24 hrs of data based timestamps like this:
[04/Nov/2009:04:02:10 +0000]
I also need to do some filtering (eg I actually don't want anything
with service.php), and I also have to do some substitutions - that's
trivial other than not knowing the optimum place to do it? IE should
I do multiple passes? Or should I try to do all the work at once,
only viewing each line once? Also what about reading from compressed
files? The data comes in as 6 gzipped logfiles which expand to 6G in
total.
S.
More information about the Tutor
mailing list