Basic questions regarding parsing logfiles

Anton Muhin antonmuhin at sendmail.ru
Thu May 15 04:05:45 EDT 2003


Ben S wrote:
> Hi,
> 
> Basically I'm trying to use Python to parse and process some log files
> produced by a game. They take the fairly standard format of a time stamp
> in the form of "Tue May 13 12:52:59 2003", and then some more
> information. A few questions:
> 
> I load the file like this:
> 
>   logFile = file(filename, 'rU')
>   lines = map(string.strip, logFile.readlines())
> 
> Is that a reasonably good way to do it (while stripping excess
> whitespace too)?
I suppose, yes. However, I'm more used to the following idion:

logFile = file(filename) # 'r' is by default and I don't understand 'U'
for line in logFile:
     line = line.strip()
     # do something....
close(logFile)

Suggested variant could need less memory that might be important if you 
are processing hige files.

> Secondly, how do I parse that timestamp into one of the Python time
> objects? I saw some ways which looked quite complex, but I thought that
> there must be a simple conversion function that will take a standard
> ctime()-generated string such as the above example and return the time.
> Maybe I just missed it in the docs?
I found no function in the standard lib, it seems rather simple task to 
do. Just for starters:

 >>> s = "Tue May 13 12:52:59 2003"
 >>> week_day, month, day, t, year = s.split()
 >>> print week_day, month, day, t, year
Tue May 13 12:52:59 2003
 >>> import time
 >>> months = {'May': 5}
 >>> hour, minute, second = t.split(':')
 >>> print (hour, minute, second)
('12', '52', '59')
 >>> r = time.mktime((int(year), months[month], int(day), int(hour), 
int(minute), int(second), 0, 0, -1))
 >>> time.ctime(r)
'Tue May 13 12:52:59 2003'

If you under *nix, take a look at strptime function of module time.

> 
> 
> Lastly (for now), my basic regular expression parsing works like this:
> 
>     pattern = re.compile(" Log " + playerName + ": ")
>     return [eachLine for eachLine in lines if pattern.search(eachLine)]
> 
> This obviously returns each line that matches "Log Whoever". But what if
> I want to return the groups from the regular expression? For example, if
> I wanted to just return the 'whoever' in the above situation, how would
> I change the list comprehension to do this?
I'd use a couple of them. E.g.:
     pattern = re.compile(" Log " + playerName + ": ")
     return [match.span() for match in map(pattern.search, lines) if match]

(sorry, Mozilla wraps lines and I have not time to fix it right now)

hth,
anton.






More information about the Python-list mailing list