[Tutor] Logfile Manipulation

Stephen Nelson-Smith sanelson at gmail.com
Mon Nov 9 17:10:44 CET 2009


On Mon, Nov 9, 2009 at 3:15 PM, Wayne Werner <waynejwerner at gmail.com> wrote:
> On Mon, Nov 9, 2009 at 7:46 AM, Stephen Nelson-Smith <sanelson at gmail.com>
> wrote:
>>
>> And the problem I have with the below is that I've discovered that the
>> input logfiles aren't strictly ordered - ie there is variance by a
>> second or so in some of the entries.
>
> Within a given set of 10 lines, is the first line and last line "in order" -

On average, in a sequence of 10 log lines, one will be out by one or
two seconds.

Here's a random slice:

05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:36
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:36
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:37
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:37
05/Nov/2009:01:41:38
05/Nov/2009:01:41:36
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:39
05/Nov/2009:01:41:38
05/Nov/2009:01:41:39
05/Nov/2009:01:41:39
05/Nov/2009:01:41:39
05/Nov/2009:01:41:39
05/Nov/2009:01:41:40
05/Nov/2009:01:41:40
05/Nov/2009:01:41:41
> I don't know
> what the default python sorting algorithm is on a list, but AFAIK you'd be
> looking at a constant O(log 10)

I'm not a mathematician - what does this mean, in layperson's terms?

> log_generator = (d for d in logdata)
> mylist = # first ten values

OK

> while True:
>     try:
>         mylist.sort()

OK - sort the first 10 values.

>         nextdata = mylist.pop(0)

So the first value...

>         mylist.append(log_generator.next())

Right, this will add another one value?

>     except StopIteration:
>         print 'done'

> Or now that I look, python has a priority queue (
> http://docs.python.org/library/heapq.html ) that you could use instead. Just
> push the next value into the queue and pop one out - you give it some
> initial qty - 10 or so, and then it will always give you the smallest value.

That sounds very cool - and I see that one of the activestate recipes
Kent suggested uses heapq too.  I'll have a play.

S.
-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com


More information about the Tutor mailing list