[Tutor] Logfile multiplexing

Stephen Nelson-Smith sanelson at gmail.com
Tue Nov 10 11:04:16 CET 2009


I have the following idea for multiplexing logfiles (ultimately into heapq):

import gzip

class LogFile:
    def __init__(self, filename, date):
        self.logfile = gzip.open(filename, 'r')
        for logline in self.logfile:
            self.line = logline
            self.stamp = self.timestamp(self.line)
            if self.stamp.startswith(date):
                break

    def timestamp(self, line):
        return " ".join(self.line.split()[3:5])

    def getline(self):
        nextline = self.line
        self.line = self.logfile.readline()
        self.stamp = self.timestamp(self.line)
        return nextline

The idea is that I can then do:

logs = [("log1", "[Nov/05/2009"), ("log2", "[Nov/05/2009"), ("log3",
"[Nov/05/2009"), ("log4", "[Nov/05/2009")]

I've tested it with one log (15M compressed, 211M uncompressed), and
it takes about 20 seconds to be ready to roll.

However, then I get unexpected behaviour:

~/system/tools/magpie $ python
Python 2.4.3 (#1, Jan 21 2009, 01:11:33)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import magpie
>>>magpie.l
<magpie.LogFile instance at 0x2b8045765bd8>
>>> magpie.l.stamp
'[05/Nov/2009:04:02:07 +0000]'
>>> magpie.l.getline()
89.151.119.195 - - [05/Nov/2009:04:02:07 +0000] "GET
/service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
HTTP/1.1" 200 50 "-" "-"

'89.151.119.195 - - [05/Nov/2009:04:02:07 +0000] "GET
/service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
HTTP/1.1" 200 50 "-" "-"\n'
>>> magpie.l.stamp
''
>>> magpie.l.getline()

''
>>>

I expected to be able to call getline() and get more lines...

a) What have I done wrong?
b) Is this an ok implementation?  What improvements could be made?
c) Is 20secs a reasonable time, or am I choosing a slow way to do this?

S.


More information about the Tutor mailing list