[Tutor] Logfile multiplexing
Stephen Nelson-Smith
sanelson at gmail.com
Tue Nov 10 11:04:16 CET 2009
I have the following idea for multiplexing logfiles (ultimately into heapq):
import gzip
class LogFile:
def __init__(self, filename, date):
self.logfile = gzip.open(filename, 'r')
for logline in self.logfile:
self.line = logline
self.stamp = self.timestamp(self.line)
if self.stamp.startswith(date):
break
def timestamp(self, line):
return " ".join(self.line.split()[3:5])
def getline(self):
nextline = self.line
self.line = self.logfile.readline()
self.stamp = self.timestamp(self.line)
return nextline
The idea is that I can then do:
logs = [("log1", "[Nov/05/2009"), ("log2", "[Nov/05/2009"), ("log3",
"[Nov/05/2009"), ("log4", "[Nov/05/2009")]
I've tested it with one log (15M compressed, 211M uncompressed), and
it takes about 20 seconds to be ready to roll.
However, then I get unexpected behaviour:
~/system/tools/magpie $ python
Python 2.4.3 (#1, Jan 21 2009, 01:11:33)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import magpie
>>>magpie.l
<magpie.LogFile instance at 0x2b8045765bd8>
>>> magpie.l.stamp
'[05/Nov/2009:04:02:07 +0000]'
>>> magpie.l.getline()
89.151.119.195 - - [05/Nov/2009:04:02:07 +0000] "GET
/service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
HTTP/1.1" 200 50 "-" "-"
'89.151.119.195 - - [05/Nov/2009:04:02:07 +0000] "GET
/service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
HTTP/1.1" 200 50 "-" "-"\n'
>>> magpie.l.stamp
''
>>> magpie.l.getline()
''
>>>
I expected to be able to call getline() and get more lines...
a) What have I done wrong?
b) Is this an ok implementation? What improvements could be made?
c) Is 20secs a reasonable time, or am I choosing a slow way to do this?
S.
More information about the Tutor
mailing list