[Tutor] Logfile Manipulation

Stephen Nelson-Smith sanelson at gmail.com
Mon Nov 9 10:36:21 CET 2009


Sorry - forgot to include the list.

On Mon, Nov 9, 2009 at 9:33 AM, Stephen Nelson-Smith <sanelson at gmail.com> wrote:
> On Mon, Nov 9, 2009 at 9:10 AM, ALAN GAULD <alan.gauld at btinternet.com> wrote:
>>
>>> An apache logfile entry looks like this:
>>>
>>>89.151.119.196 - - [04/Nov/2009:04:02:10 +0000] "GET
>>> /service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
>>> HTTP/1.1" 200 50 "-" "-"
>>>
>>>I want to extract 24 hrs of data based timestamps like this:
>>>
>>> [04/Nov/2009:04:02:10 +0000]
>>
>> OK It looks like you could use a regex to extract the first
>> thing you find between square brackets. Then convert that to a time.
>
> I'm currently thinking I can just use a string comparison after the
> first entry for the day - that saves date arithmetic.
>
>> I'd opt for doing it all in one pass. With such large files you really
>> want to minimise the amount of time spent reading the file.
>> Plus with such large files you will need/want to process them
>> line by line anyway rather than reading the whole thing into memory.
>
> How do I handle concurrency?  I have 6 log files which I need to turn
> into one time-sequenced log.
>
> I guess I need to switch between each log depending on whether the
> next entry is the next chronological entry between all six.  Then on a
> per line basis I can also reject it if it matches the stuff I want to
> throw out, and substitute it if I need to, then write out to the new
> file.
>
> S.
>



-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com


More information about the Tutor mailing list