Basic questions regarding parsing logfiles
Anton Muhin
antonmuhin at sendmail.ru
Thu May 15 04:05:45 EDT 2003
Ben S wrote:
> Hi,
>
> Basically I'm trying to use Python to parse and process some log files
> produced by a game. They take the fairly standard format of a time stamp
> in the form of "Tue May 13 12:52:59 2003", and then some more
> information. A few questions:
>
> I load the file like this:
>
> logFile = file(filename, 'rU')
> lines = map(string.strip, logFile.readlines())
>
> Is that a reasonably good way to do it (while stripping excess
> whitespace too)?
I suppose, yes. However, I'm more used to the following idion:
logFile = file(filename) # 'r' is by default and I don't understand 'U'
for line in logFile:
line = line.strip()
# do something....
close(logFile)
Suggested variant could need less memory that might be important if you
are processing hige files.
> Secondly, how do I parse that timestamp into one of the Python time
> objects? I saw some ways which looked quite complex, but I thought that
> there must be a simple conversion function that will take a standard
> ctime()-generated string such as the above example and return the time.
> Maybe I just missed it in the docs?
I found no function in the standard lib, it seems rather simple task to
do. Just for starters:
>>> s = "Tue May 13 12:52:59 2003"
>>> week_day, month, day, t, year = s.split()
>>> print week_day, month, day, t, year
Tue May 13 12:52:59 2003
>>> import time
>>> months = {'May': 5}
>>> hour, minute, second = t.split(':')
>>> print (hour, minute, second)
('12', '52', '59')
>>> r = time.mktime((int(year), months[month], int(day), int(hour),
int(minute), int(second), 0, 0, -1))
>>> time.ctime(r)
'Tue May 13 12:52:59 2003'
If you under *nix, take a look at strptime function of module time.
>
>
> Lastly (for now), my basic regular expression parsing works like this:
>
> pattern = re.compile(" Log " + playerName + ": ")
> return [eachLine for eachLine in lines if pattern.search(eachLine)]
>
> This obviously returns each line that matches "Log Whoever". But what if
> I want to return the groups from the regular expression? For example, if
> I wanted to just return the 'whoever' in the above situation, how would
> I change the list comprehension to do this?
I'd use a couple of them. E.g.:
pattern = re.compile(" Log " + playerName + ": ")
return [match.span() for match in map(pattern.search, lines) if match]
(sorry, Mozilla wraps lines and I have not time to fix it right now)
hth,
anton.
More information about the Python-list
mailing list