parse date/time from a log entry with only strftime (and no regexen)

MRAB google at mrabarnett.plus.com
Tue Feb 3 11:00:40 EST 2009


Simon Mullis wrote:
 > Hi All
 >
 > I'm writing a script to help with analyzing log files timestamps and
 > have a very specific question on which I'm momentarily stumped....
 >
 > I'd like the script to support multiple log file types, so allow a
 > strftime format to be passed in as a cli switch (default is %Y-%m-%d
 > %H:%M:%S).
 >
 > When it comes to actually doing the analysis I want to store or discard
 > the log entry based on certain criteria. In fact, I only need the log
 > line timestamp.
 >
 > I'd like to do this in one step and therefore not require the user to
 > supply a regex aswell as a strftime format:
 >
 >  >>> import datetime
 >  >>> p = datetime.datetime.strptime("2008-07-23 12:18:28 this is the
 > remainder of the log line that I do not care about", "%Y-%m-%d %H:%M:%S")
 > Traceback (most recent call last):
 >   File "<stdin>", line 1, in <module>
 >   File "/opt/local/lib/python2.5/_strptime.py", line 333, in strptime
 >     data_string[found.end():])
 > ValueError: unconverted data remains:  this is the remainder of the log
 > line that I do not care about
 >
 >  >>> repr(p)
 > NameError: name 'p' is not defined
 >
 > Clearly the strptime method above can grab the right bits of data but
 > the string "p" is not created due to the error.
 >
 > So, my options are:
 >
 > 1 - Only support one log format.
 >
 > 2 - Support any log format but require a regex as well as a strftime
 > format so I can extract the "timestamp" portion.
 >
 > 3 - Create another class/method with a lookup table for the strftime
 > options that automagically creates the correct regex to extract the
 > right string from the log entry... (or is this overly complicated)
 >
 > 4 - Override the method above (strptime) to allow what I'm trying to do).
 >
 > 4 - Some other very clever and elegant solution that I would not ever
 > manage to think of myself....
 >
 >
 > Am I making any sense whatsoever?
 >
 > Thanks
 >
 > SM
 >
 > (P.S The reason I don't want the end user to supply a regex for the
 > timestamin he log-entry is that we're already using 2 other regexes as
 > cli switches to select the file glob and log line to match....)
 >
If the timestamp is always at the start of the line (and I expect it is)
and is always the same length then you could calculate how long the
timestamp is from the format (eg "%Y" matches 4 characters) and use
string slicing.

If the timestamp isn't a fixed length then a generating a regex might be
needed.



More information about the Python-list mailing list