log parser design question

avidfan noone at nowhere.com
Sun Feb 4 01:55:17 EST 2007


On Mon, 29 Jan 2007 23:11:32 -0600, avidfan <noone at nowhere.com> wrote:

>On 28 Jan 2007 21:20:47 -0800, "Paul McGuire" <ptmcg at austin.rr.com>
>wrote:
>
>>On Jan 27, 10:43 pm, avidfan <n... at nowhere.com> wrote:
>>> I need to parse a log file using python and I need some advice/wisdom
>>> on the best way to go about it:
>>>
>>> The log file entries will consist of something like this:
>>>
>>> ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
>>>         locked working.lock
>>>         status running
>>>         status complete
>>>
>>> ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
>>>         waiting to lock
>>>         status wait
>>>         waiting on ID=8688
>>>
>>> and so on...
>>>
>>For the parsing of this data, here is a pyparsing approach.  Once 
>>parse, the pyparsing ParseResults data structures can be massaged into 
>>a queryable list.  See the examples at the end for accessing the 
>>individual parsed fields.
>>
>>-- Paul
>>
>>data = """
>>ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
>>        locked working.lock
>>        status running
>>        status complete
>>
>>
>>ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
>>        waiting to lock
>>        status wait
>>        waiting on ID=8688
>>
>>"""
>>from pyparsing import *
>>
>>integer=Word(nums)
>>idref = "ID=" + integer.setResultsName("id")
>>iidref = "IID=" + integer.setResultsName("iid")
>>date = Regex(r"\d\d\.\d\d\.\d{4}")
>>
>>logLabel = Group("execute" + oneOf("begin wait"))
>>logStatus = Group("status" + oneOf("enabled wait"))
>>lockQual = Group("locked" + Word(alphanums+"."))
>>waitingOnQual = Group("waiting on" + idref)
>>statusQual = Group("status" + oneOf("running complete wait"))
>>waitingToLockQual = Group(Literal("waiting to lock"))
>>statusQualifier = statusQual | waitingOnQual | waitingToLockQual | 
>>lockQual
>>logEntry = idref + iidref + logLabel.setResultsName("logtype") + "-" \
>>    + date + logStatus.setResultsName("status") \
>>    + ZeroOrMore(statusQualifier).setResultsName("quals")
>>
>>for tokens in logEntry.searchString(data):
>>    print tokens
>>    print tokens.dump()
>>    print tokens.id
>>    print tokens.iid
>>    print tokens.status
>>    print tokens.quals
>>    print
>>
>>prints:
>>
>>['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-', 
>>'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'], 
>>['status', 'running'], ['status', 'complete']]
>>['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-', 
>>'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'], 
>>['status', 'running'], ['status', 'complete']]
>>- id: 8688
>>- iid: 98889998
>>- logtype: ['execute', 'begin']
>>- quals: [['locked', 'working.lock'], ['status', 'running'], 
>>['status', 'complete']]
>>- status: ['status', 'enabled']
>>8688
>>98889998
>>['status', 'enabled']
>>[['locked', 'working.lock'], ['status', 'running'], ['status', 
>>'complete']]
>>
>>['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-', 
>>'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status', 
>>'wait'], ['waiting on', 'ID=', '8688']]
>>['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-', 
>>'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status', 
>>'wait'], ['waiting on', 'ID=', '8688']]
>>- id: 9009
>>- iid: 87234785
>>- logtype: ['execute', 'wait']
>>- quals: [['waiting to lock'], ['status', 'wait'], ['waiting on', 
>>'ID=', '8688']]
>>- status: ['status', 'wait']
>>9009
>>87234785
>>['status', 'wait']
>>[['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=', 
>>'8688']]
>
>Paul,
>
>Thanks!  That's a great module.  I've been going through the docs and
>it seems to do exactly what I need...
>
>I appreciate your help!

http://www.camelrichard.org/roller/page/camelblog?entry=h3_parsing_log_files_with

Thanks, Paul!




More information about the Python-list mailing list