log parser design question

avidfan noone at nowhere.com
Tue Jan 30 00:11:32 EST 2007


On 28 Jan 2007 21:20:47 -0800, "Paul McGuire" <ptmcg at austin.rr.com>
wrote:

>On Jan 27, 10:43 pm, avidfan <n... at nowhere.com> wrote:
>> I need to parse a log file using python and I need some advice/wisdom
>> on the best way to go about it:
>>
>> The log file entries will consist of something like this:
>>
>> ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
>>         locked working.lock
>>         status running
>>         status complete
>>
>> ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
>>         waiting to lock
>>         status wait
>>         waiting on ID=8688
>>
>> and so on...
>>
>For the parsing of this data, here is a pyparsing approach.  Once 
>parse, the pyparsing ParseResults data structures can be massaged into 
>a queryable list.  See the examples at the end for accessing the 
>individual parsed fields.
>
>-- Paul
>
>data = """
>ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
>        locked working.lock
>        status running
>        status complete
>
>
>ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
>        waiting to lock
>        status wait
>        waiting on ID=8688
>
>"""
>from pyparsing import *
>
>integer=Word(nums)
>idref = "ID=" + integer.setResultsName("id")
>iidref = "IID=" + integer.setResultsName("iid")
>date = Regex(r"\d\d\.\d\d\.\d{4}")
>
>logLabel = Group("execute" + oneOf("begin wait"))
>logStatus = Group("status" + oneOf("enabled wait"))
>lockQual = Group("locked" + Word(alphanums+"."))
>waitingOnQual = Group("waiting on" + idref)
>statusQual = Group("status" + oneOf("running complete wait"))
>waitingToLockQual = Group(Literal("waiting to lock"))
>statusQualifier = statusQual | waitingOnQual | waitingToLockQual | 
>lockQual
>logEntry = idref + iidref + logLabel.setResultsName("logtype") + "-" \
>    + date + logStatus.setResultsName("status") \
>    + ZeroOrMore(statusQualifier).setResultsName("quals")
>
>for tokens in logEntry.searchString(data):
>    print tokens
>    print tokens.dump()
>    print tokens.id
>    print tokens.iid
>    print tokens.status
>    print tokens.quals
>    print
>
>prints:
>
>['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-', 
>'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'], 
>['status', 'running'], ['status', 'complete']]
>['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-', 
>'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'], 
>['status', 'running'], ['status', 'complete']]
>- id: 8688
>- iid: 98889998
>- logtype: ['execute', 'begin']
>- quals: [['locked', 'working.lock'], ['status', 'running'], 
>['status', 'complete']]
>- status: ['status', 'enabled']
>8688
>98889998
>['status', 'enabled']
>[['locked', 'working.lock'], ['status', 'running'], ['status', 
>'complete']]
>
>['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-', 
>'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status', 
>'wait'], ['waiting on', 'ID=', '8688']]
>['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-', 
>'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status', 
>'wait'], ['waiting on', 'ID=', '8688']]
>- id: 9009
>- iid: 87234785
>- logtype: ['execute', 'wait']
>- quals: [['waiting to lock'], ['status', 'wait'], ['waiting on', 
>'ID=', '8688']]
>- status: ['status', 'wait']
>9009
>87234785
>['status', 'wait']
>[['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=', 
>'8688']]

Paul,

Thanks!  That's a great module.  I've been going through the docs and
it seems to do exactly what I need...

I appreciate your help!




More information about the Python-list mailing list