splitting a words of a line
Paul McGuire
ptmcg at austin.rr.com
Fri Dec 7 09:47:05 EST 2007
On Dec 6, 9:21 am, Sumit <sumit.na... at gmail.com> wrote:
> Hi ,
> I am trying to splitt a Line whihc is below of format ,
>
> AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32 CN=dddd
> cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
> Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/performance/
> SelectProducts.aspx?
> p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]
>
As John Machin mentioned, pyparsing may be helpful to you. Here is a
simple version:
data = """AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500]
"162.44.245.32 CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-
Users,OU=kkk Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /
mci/performance/SelectProducts.aspx?
p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]"""
# Version 1 - simple
from pyparsing import *
LBRACK,RBRACK,COMMA = map(Suppress,"[],")
num = Word(nums)
date = Combine(num+"/"+Word(alphas)+"/"+num+":"+num+":"+num+":"+num) +
\
oneOf("+ -") + num
date.setParseAction(keepOriginalText)
uuid = delimitedList(Word(hexnums),"-",combine=True)
logString = Word(alphas,alphanums) + Word(alphas,alphanums) + \
LBRACK + date + RBRACK + quotedString + quotedString + \
LBRACK + uuid + RBRACK + LBRACK + Word(nums) + RBRACK
print logString.parseString(data)
Prints out:
['AzAccept', 'PLYSSTM01', '23/Sep/2005:16:14:28 -0500',
'"162.44.245.32 CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-
Users,OU=kkk Secure,DC=customer,DC=rxcorp,DC=com"', '"plysmhc03zp GET /
mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc"',
'd4b62ca2-09a0-4334622b-0e1c-03c42ba5', '0']
And here is a slightly fancier version, which parses the quoted
strings (uses the pprint pretty-printing module to show structure of
the parsed results):
# Version 2 - fancy
from pyparsing import *
LBRACK,RBRACK,COMMA = map(Suppress,"[],")
num = Word(nums)
date = Combine(num+"/"+Word(alphas)+"/"+num+":"+num+":"+num+":"+num) +
\
oneOf("+ -") + num
date.setParseAction(keepOriginalText)
uuid = delimitedList(Word(hexnums),"-",combine=True)
ipAddr = delimitedList(Word(nums),".",combine=True)
keyExpr=Word(alphas.upper())
valExpr=CharsNotIn(',')
qs1Expr = ipAddr + Group(delimitedList(Combine(keyExpr + '=' +
valExpr)))
def parseQS1(t):
return qs1Expr.parseString(t[0])
def parseQS2(t):
return t[0].split()
qs1 = quotedString.copy().setParseAction(removeQuotes, parseQS1)
qs2 = quotedString.copy().setParseAction(removeQuotes, parseQS2)
logString = Word(alphas,alphanums) + Word(alphas,alphanums) + \
LBRACK + date + RBRACK + qs1 + qs2 + \
LBRACK + uuid + RBRACK + LBRACK + Word(nums) + RBRACK
from pprint import pprint
pprint(logString.parseString(data).asList())
Prints:
['AzAccept',
'PLYSSTM01',
'23/Sep/2005:16:14:28 -0500',
'162.44.245.32',
['CN=dddd cojack (890)',
'OU=1',
'OU=Customers',
'OU=ISM-Users',
'OU=kkk Secure',
'DC=customer',
'DC=rxcorp',
'DC=com'],
'plysmhc03zp',
'GET',
'/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc',
'd4b62ca2-09a0-4334622b-0e1c-03c42ba5',
'0']
Find more about pyparsing at http://pyparsing.wikispaces.com.
-- Paul
More information about the Python-list
mailing list