pyparsing question: single word values with a double quoted string every once in a while

Piet van Oostrum piet at cs.uu.nl
Wed May 27 07:12:03 EDT 2009


>>>>> hubritic <colinlandrum at gmail.com> (h) wrote:

>h> I want to parse a log that has entries like this:
>h> [2009-03-17 07:28:05.545476 -0500] rprt s=d2bpr80d6 m=2 mod=mail
>h> cmd=msg module=access rule=x_dynamic_ip action=discard attachments=0
>h> rcpts=1
>h> routes=DL_UK_ALL,NOT_DL_UK_ALL,default_inbound,firewallsafe,mail01_mail02,spfsafe
>h> size=4363 guid=291f0f108fd3a6e73a11f96f4fb9e4cd hdr_mid=
>h> qid=n2HCS4ks025832 subject="I want to interview you" duration=0.236
>h> elapsed=0.280


>h> the keywords will not always be the same. Also differing log levels
>h> will provide a different mix of keywords.

>h> This is good enough to get the majority of cases where there is a
>h> keyword, a "=" and then a value with no spaces:

>h> Group(Word(alphas + "+_-.").setResultsName("keyword") +  Suppress
>h> (Literal ("=")) + Optional(Word(printables)))

>h> Sometimes there is a subject, which is a quoted string. That is easy
>h> enough to get with this:
>h> dblQuotedString(ZeroOrMore(Word(printables) ) )

>h> My problem is combining them into one expression. Either I wind up
>h> with just the subject or I wind up with they keywords and their
>h> values, one of which is:

>h> subject, '"I'

>h> which is clearly not what I want.

>h> Do I scan each line twice, first looking for quotes ?


Use the MatchFirst (|)

I have also split it up to make it more readable

kw = Word(alphas + "+_-.").setResultsName("keyword") 
eq = Suppress(Literal ("="))
value = dblQuotedString | Optional(Word(printables))

pattern = Group(kw + eq + value)

-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: piet at vanoostrum.org



More information about the Python-list mailing list