Help with pyparsing and dealing with null values

Paul McGuire ptmcg at austin.rr.com
Mon Oct 29 08:45:26 EDT 2007


On Oct 29, 1:11 am, avidfan <no... at nowhere.com> wrote:
> Help with pyparsing and dealing with null values
>
> I am trying to parse a log file (web.out) similar to this:
>
> -----------------------------------------------------------
>
> MBeanName: "mtg-model:Name=mtg-model_managed2,Type=Server"
>         AcceptBacklog: 50
<snip>
>         ExpectedToRun: false
>         ExternalDNSName:
>         ExtraEjbcOptions:
>         ExtraRmicOptions:
>         GracefulShutdownTimeout: 0
>
> -----------------------------------------------------------
>
> and I need the indented values (eventually) in a dictionary.  As you
> can see, some of the fields have a value, and some do not.  It appears
> that the code I have so far is not dealing with the null values and
> colons as I had planned.
>

This is a very good first cut at the problem.  Here are some tips to
get you going again:

1. Literal("\n") wont work, use LineEnd() instead.  Literals are for
non-whitespace literal strings.


2. "all = SkipTo(end)" can be removed, use restOfLine instead of all.
("all" as a variable name masks Python 2.5's "all" builtin function.)


3. In addition to identity, you might consider defining some other
known value types:

boolean = oneOf("true false")
boolean.setParseAction(lambda toks: toks[0]=="true")

integer = Combine(Optional("-") + Word(nums))
integer.setParseAction(lambda toks: int(toks[0]))

These will do data conversion for you at parse time, so that the
values are already in int or bool form when you access them later.


4. The significant change is to this line (I've replaced all with
restOfLine):

pairs = Group(identity + colon + Optional(identity) + restOfLine)

What gives us a problem is that pyparsing's whitespace-skipping will
read an identity, even if it's not on the same line.  So for keys that
have no value given, you end up reading past the end-of-line and read
the next key name as the value for the previous key.  To work around
this, define the value as something which must be on the same line,
using the NotAny lookahead, which you can abbreviate using the ~
operator.

pairs = Group(identity + colon + Optional(~end + (identity |
restOfLine) ) + end )

If we add in the other known value types, this gets a bit unwieldy, so
I recommend you define value separately:

value = boolean | integer | identity | restOfLine
pairs = Group(identity + colon + Optional(~end + value) + end )

At this point, I think you have a working parser for your log data.


5. (Extra Credit) Lastly, to create a dictionary, you are all set to
just add pyparsing's Dict class.  Change:

logEntry = MBeanName + ServerName("servername") + OneOrMore(pairs)

to:

logEntry = MBeanName + ServerName("servername") +
Dict(OneOrMore(pairs))

(I've also removed ".setResultsName", using the new shortened form for
setting results names.)

Dict will return the parsed tokens as-is, but it will also define
results names using the tokens[0] element of each list of tokens
returned by pairs - the values will be the tokens[1:], so that if a
value expression contains multiple tokens, they all will be associated
with the results name key.

Now you can replace the results listing code with:

    for t in tokens:
       print t

with

    print tokens.dump()

And you can access the tokens as if they are a dict, using:

    print tokens.keys()
    print tokens.values()
    print tokens["ClasspathServletDisabled"]

If you prefer, for keys that are valid Python identifiers (all of
yours appear to be), you can just use object.attribute notation:

    print tokens.ClasspathServletDisabled

Here is some sample output, using dump(), keys(), and attribute
lookup:

tokens.dump() -> ['MBeanName:', '"mtg-model:Name=mtg-
model_managed2,Type=Server"', ['AcceptBacklog', 50],
['AdministrationPort', 0], ['AutoKillIfFailed', False],
['AutoRestart', True], ['COM', 'mtg-model_managed2'], ['COMEnabled',
False], ['CachingDisabled', True], ['ClasspathServletDisabled',
False], ['ClientCertProxyEnabled', False], ['Cluster', 'mtg-model-
cluster'], ['ClusterRuntime', 'mtg-model-cluster'], ['ClusterWeight',
100], ['CompleteCOMMessageTimeout', -1],
['CompleteHTTPMessageTimeout', -1], ['CompleteIIOPMessageTimeout',
-1], ['CompleteMessageTimeout', 60], ['CompleteT3MessageTimeout', -1],
['CustomIdentityKeyStoreFileName'],
['CustomIdentityKeyStorePassPhrase'],
['CustomIdentityKeyStorePassPhraseEncrypted'],
['CustomIdentityKeyStoreType'], ['CustomTrustKeyStoreFileName'],
['CustomTrustKeyStorePassPhrase'],
['CustomTrustKeyStorePassPhraseEncrypted'],
['CustomTrustKeyStoreType'], ['DefaultIIOPPassword'],
['DefaultIIOPPasswordEncrypted'], ['DefaultIIOPUser'],
['DefaultInternalServletsDisabled', False], ['DefaultProtocol', 't3'],
['DefaultSecureProtocol', 't3s'], ['DefaultTGIOPPassword'],
['DefaultTGIOPPasswordEncrypted', ' ****** '], ['DefaultTGIOPUser',
'guest'], ['DomainLogFilter'], ['EnabledForDomainLog', True],
['ExecuteQueues', 'weblogic.kernel.Default,foglight'],
['ExpectedToRun', False], ['ExternalDNSName'], ['ExtraEjbcOptions'],
['ExtraRmicOptions'], ['GracefulShutdownTimeout', 0]]
- AcceptBacklog: 50
- AdministrationPort: 0
- AutoKillIfFailed: False
- AutoRestart: True
- COM: mtg-model_managed2
- COMEnabled: False
- CachingDisabled: True
- ClasspathServletDisabled: False
- ClientCertProxyEnabled: False
- Cluster: mtg-model-cluster
- ClusterRuntime: mtg-model-cluster
- ClusterWeight: 100
- CompleteCOMMessageTimeout: -1
- CompleteHTTPMessageTimeout: -1
- CompleteIIOPMessageTimeout: -1
- CompleteMessageTimeout: 60
- CompleteT3MessageTimeout: -1
- CustomIdentityKeyStoreFileName:
- CustomIdentityKeyStorePassPhrase:
- CustomIdentityKeyStorePassPhraseEncrypted:
- CustomIdentityKeyStoreType:
- CustomTrustKeyStoreFileName:
- CustomTrustKeyStorePassPhrase:
- CustomTrustKeyStorePassPhraseEncrypted:
- CustomTrustKeyStoreType:
- DefaultIIOPPassword:
- DefaultIIOPPasswordEncrypted:
- DefaultIIOPUser:
- DefaultInternalServletsDisabled: False
- DefaultProtocol: t3
- DefaultSecureProtocol: t3s
- DefaultTGIOPPassword:
- DefaultTGIOPPasswordEncrypted:  ******
- DefaultTGIOPUser: guest
- DomainLogFilter:
- EnabledForDomainLog: True
- ExecuteQueues: weblogic.kernel.Default,foglight
- ExpectedToRun: False
- ExternalDNSName:
- ExtraEjbcOptions:
- ExtraRmicOptions:
- GracefulShutdownTimeout: 0
- servername: "mtg-model:Name=mtg-model_managed2,Type=Server"

tokens.keys() -> ['ClasspathServletDisabled', 'servername',
'ExternalDNSName', 'CustomTrustKeyStoreFileName', 'DefaultIIOPUser',
'ExpectedToRun', 'CachingDisabled', 'CompleteHTTPMessageTimeout',
'CompleteIIOPMessageTimeout', 'AutoKillIfFailed',
'ClientCertProxyEnabled', 'ExtraEjbcOptions',
'CustomTrustKeyStorePassPhraseEncrypted', 'COM',
'CompleteMessageTimeout', 'CustomIdentityKeyStoreType',
'CustomTrustKeyStoreType', 'EnabledForDomainLog', 'AutoRestart',
'DefaultTGIOPPasswordEncrypted', 'CompleteCOMMessageTimeout',
'DefaultInternalServletsDisabled', 'DefaultProtocol', 'ClusterWeight',
'ExecuteQueues', 'ExtraRmicOptions', 'CompleteT3MessageTimeout',
'DefaultTGIOPUser', 'AcceptBacklog', 'DefaultIIOPPassword',
'DefaultSecureProtocol', 'COMEnabled',
'CustomIdentityKeyStoreFileName', 'DefaultTGIOPPassword',
'CustomIdentityKeyStorePassPhraseEncrypted',
'GracefulShutdownTimeout', 'DefaultIIOPPasswordEncrypted',
'CustomIdentityKeyStorePassPhrase', 'ClusterRuntime', 'Cluster',
'DomainLogFilter', 'CustomTrustKeyStorePassPhrase',
'AdministrationPort']

tokens.ClasspathServletDisabled -> False


Cheers,
-- Paul





More information about the Python-list mailing list