Help with pyparsing and dealing with null values

avidfan noone at nowhere.com
Wed Oct 31 11:42:24 EDT 2007


On Mon, 29 Oct 2007 05:45:26 -0700, Paul McGuire <ptmcg at austin.rr.com>
wrote:

>On Oct 29, 1:11 am, avidfan <no... at nowhere.com> wrote:
>> Help with pyparsing and dealing with null values
>>
>> I am trying to parse a log file (web.out) similar to this:
>>
>> -----------------------------------------------------------
>>
>> MBeanName: "mtg-model:Name=mtg-model_managed2,Type=Server"
>>         AcceptBacklog: 50
><snip>
>>         ExpectedToRun: false
>>         ExternalDNSName:
>>         ExtraEjbcOptions:
>>         ExtraRmicOptions:
>>         GracefulShutdownTimeout: 0
>>
>> -----------------------------------------------------------
>>
>> and I need the indented values (eventually) in a dictionary.  As you
>> can see, some of the fields have a value, and some do not.  It appears
>> that the code I have so far is not dealing with the null values and
>> colons as I had planned.
>>
>
>This is a very good first cut at the problem.  Here are some tips to
>get you going again:
>
>1. Literal("\n") wont work, use LineEnd() instead.  Literals are for
>non-whitespace literal strings.
>
>
>2. "all = SkipTo(end)" can be removed, use restOfLine instead of all.
>("all" as a variable name masks Python 2.5's "all" builtin function.)
>
>
>3. In addition to identity, you might consider defining some other
>known value types:
>
>boolean = oneOf("true false")
>boolean.setParseAction(lambda toks: toks[0]=="true")
>
>integer = Combine(Optional("-") + Word(nums))
>integer.setParseAction(lambda toks: int(toks[0]))
>
>These will do data conversion for you at parse time, so that the
>values are already in int or bool form when you access them later.
>
>
>4. The significant change is to this line (I've replaced all with
>restOfLine):
>
>pairs = Group(identity + colon + Optional(identity) + restOfLine)
>
>What gives us a problem is that pyparsing's whitespace-skipping will
>read an identity, even if it's not on the same line.  So for keys that
>have no value given, you end up reading past the end-of-line and read
>the next key name as the value for the previous key.  To work around
>this, define the value as something which must be on the same line,
>using the NotAny lookahead, which you can abbreviate using the ~
>operator.
>
>pairs = Group(identity + colon + Optional(~end + (identity |
>restOfLine) ) + end )
>
>If we add in the other known value types, this gets a bit unwieldy, so
>I recommend you define value separately:
>
>value = boolean | integer | identity | restOfLine
>pairs = Group(identity + colon + Optional(~end + value) + end )
>
>At this point, I think you have a working parser for your log data.
>
>
>5. (Extra Credit) Lastly, to create a dictionary, you are all set to
>just add pyparsing's Dict class.  Change:
>
>logEntry = MBeanName + ServerName("servername") + OneOrMore(pairs)
>
>to:
>
>logEntry = MBeanName + ServerName("servername") +
>Dict(OneOrMore(pairs))
>
>(I've also removed ".setResultsName", using the new shortened form for
>setting results names.)
>
>Dict will return the parsed tokens as-is, but it will also define
>results names using the tokens[0] element of each list of tokens
>returned by pairs - the values will be the tokens[1:], so that if a
>value expression contains multiple tokens, they all will be associated
>with the results name key.
>
>Now you can replace the results listing code with:
>
>    for t in tokens:
>       print t
>
>with
>
>    print tokens.dump()
>
>And you can access the tokens as if they are a dict, using:
>
>    print tokens.keys()
>    print tokens.values()
>    print tokens["ClasspathServletDisabled"]
>
>If you prefer, for keys that are valid Python identifiers (all of
>yours appear to be), you can just use object.attribute notation:
>
>    print tokens.ClasspathServletDisabled
>
>Here is some sample output, using dump(), keys(), and attribute
>lookup:
>
>tokens.dump() -> ['MBeanName:', '"mtg-model:Name=mtg-
>model_managed2,Type=Server"', ['AcceptBacklog', 50],
>['AdministrationPort', 0], ['AutoKillIfFailed', False],
>['AutoRestart', True], ['COM', 'mtg-model_managed2'], ['COMEnabled',
>False], ['CachingDisabled', True], ['ClasspathServletDisabled',
>False], ['ClientCertProxyEnabled', False], ['Cluster', 'mtg-model-
>cluster'], ['ClusterRuntime', 'mtg-model-cluster'], ['ClusterWeight',
>100], ['CompleteCOMMessageTimeout', -1],
>['CompleteHTTPMessageTimeout', -1], ['CompleteIIOPMessageTimeout',
>-1], ['CompleteMessageTimeout', 60], ['CompleteT3MessageTimeout', -1],
>['CustomIdentityKeyStoreFileName'],
>['CustomIdentityKeyStorePassPhrase'],
>['CustomIdentityKeyStorePassPhraseEncrypted'],
>['CustomIdentityKeyStoreType'], ['CustomTrustKeyStoreFileName'],
>['CustomTrustKeyStorePassPhrase'],
>['CustomTrustKeyStorePassPhraseEncrypted'],
>['CustomTrustKeyStoreType'], ['DefaultIIOPPassword'],
>['DefaultIIOPPasswordEncrypted'], ['DefaultIIOPUser'],
>['DefaultInternalServletsDisabled', False], ['DefaultProtocol', 't3'],
>['DefaultSecureProtocol', 't3s'], ['DefaultTGIOPPassword'],
>['DefaultTGIOPPasswordEncrypted', ' ****** '], ['DefaultTGIOPUser',
>'guest'], ['DomainLogFilter'], ['EnabledForDomainLog', True],
>['ExecuteQueues', 'weblogic.kernel.Default,foglight'],
>['ExpectedToRun', False], ['ExternalDNSName'], ['ExtraEjbcOptions'],
>['ExtraRmicOptions'], ['GracefulShutdownTimeout', 0]]
>- AcceptBacklog: 50
>- AdministrationPort: 0
>- AutoKillIfFailed: False
>- AutoRestart: True
>- COM: mtg-model_managed2
>- COMEnabled: False
>- CachingDisabled: True
>- ClasspathServletDisabled: False
>- ClientCertProxyEnabled: False
>- Cluster: mtg-model-cluster
>- ClusterRuntime: mtg-model-cluster
>- ClusterWeight: 100
>- CompleteCOMMessageTimeout: -1
>- CompleteHTTPMessageTimeout: -1
>- CompleteIIOPMessageTimeout: -1
>- CompleteMessageTimeout: 60
>- CompleteT3MessageTimeout: -1
>- CustomIdentityKeyStoreFileName:
>- CustomIdentityKeyStorePassPhrase:
>- CustomIdentityKeyStorePassPhraseEncrypted:
>- CustomIdentityKeyStoreType:
>- CustomTrustKeyStoreFileName:
>- CustomTrustKeyStorePassPhrase:
>- CustomTrustKeyStorePassPhraseEncrypted:
>- CustomTrustKeyStoreType:
>- DefaultIIOPPassword:
>- DefaultIIOPPasswordEncrypted:
>- DefaultIIOPUser:
>- DefaultInternalServletsDisabled: False
>- DefaultProtocol: t3
>- DefaultSecureProtocol: t3s
>- DefaultTGIOPPassword:
>- DefaultTGIOPPasswordEncrypted:  ******
>- DefaultTGIOPUser: guest
>- DomainLogFilter:
>- EnabledForDomainLog: True
>- ExecuteQueues: weblogic.kernel.Default,foglight
>- ExpectedToRun: False
>- ExternalDNSName:
>- ExtraEjbcOptions:
>- ExtraRmicOptions:
>- GracefulShutdownTimeout: 0
>- servername: "mtg-model:Name=mtg-model_managed2,Type=Server"
>
>tokens.keys() -> ['ClasspathServletDisabled', 'servername',
>'ExternalDNSName', 'CustomTrustKeyStoreFileName', 'DefaultIIOPUser',
>'ExpectedToRun', 'CachingDisabled', 'CompleteHTTPMessageTimeout',
>'CompleteIIOPMessageTimeout', 'AutoKillIfFailed',
>'ClientCertProxyEnabled', 'ExtraEjbcOptions',
>'CustomTrustKeyStorePassPhraseEncrypted', 'COM',
>'CompleteMessageTimeout', 'CustomIdentityKeyStoreType',
>'CustomTrustKeyStoreType', 'EnabledForDomainLog', 'AutoRestart',
>'DefaultTGIOPPasswordEncrypted', 'CompleteCOMMessageTimeout',
>'DefaultInternalServletsDisabled', 'DefaultProtocol', 'ClusterWeight',
>'ExecuteQueues', 'ExtraRmicOptions', 'CompleteT3MessageTimeout',
>'DefaultTGIOPUser', 'AcceptBacklog', 'DefaultIIOPPassword',
>'DefaultSecureProtocol', 'COMEnabled',
>'CustomIdentityKeyStoreFileName', 'DefaultTGIOPPassword',
>'CustomIdentityKeyStorePassPhraseEncrypted',
>'GracefulShutdownTimeout', 'DefaultIIOPPasswordEncrypted',
>'CustomIdentityKeyStorePassPhrase', 'ClusterRuntime', 'Cluster',
>'DomainLogFilter', 'CustomTrustKeyStorePassPhrase',
>'AdministrationPort']
>
>tokens.ClasspathServletDisabled -> False
>
>
>Cheers,
>-- Paul
>

Thanks, Paul!  That's exactly what I needed!




More information about the Python-list mailing list