splitting a words of a line

Paul McGuire ptmcg at austin.rr.com
Fri Dec 7 09:47:05 EST 2007


On Dec 6, 9:21 am, Sumit <sumit.na... at gmail.com> wrote:
> Hi ,
>            I am trying to splitt  a Line whihc is below of format ,
>
> AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32 CN=dddd
> cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
> Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/performance/
> SelectProducts.aspx?
> p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]
>

As John Machin mentioned, pyparsing may be helpful to you.  Here is a
simple version:

data = """AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500]
"162.44.245.32 CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-
Users,OU=kkk Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /
mci/performance/SelectProducts.aspx?
p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]"""

# Version 1 - simple
from pyparsing import *
LBRACK,RBRACK,COMMA = map(Suppress,"[],")
num = Word(nums)
date = Combine(num+"/"+Word(alphas)+"/"+num+":"+num+":"+num+":"+num) +
\
        oneOf("+ -") + num
date.setParseAction(keepOriginalText)
uuid = delimitedList(Word(hexnums),"-",combine=True)
logString = Word(alphas,alphanums) + Word(alphas,alphanums) + \
    LBRACK + date + RBRACK + quotedString + quotedString + \
    LBRACK + uuid + RBRACK + LBRACK + Word(nums) + RBRACK

print logString.parseString(data)

Prints out:
['AzAccept', 'PLYSSTM01', '23/Sep/2005:16:14:28 -0500',
'"162.44.245.32 CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-
Users,OU=kkk Secure,DC=customer,DC=rxcorp,DC=com"', '"plysmhc03zp GET /
mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc"',
'd4b62ca2-09a0-4334622b-0e1c-03c42ba5', '0']


And here is a slightly fancier version, which parses the quoted
strings (uses the pprint pretty-printing module to show structure of
the parsed results):

# Version 2 - fancy
from pyparsing import *
LBRACK,RBRACK,COMMA = map(Suppress,"[],")
num = Word(nums)
date = Combine(num+"/"+Word(alphas)+"/"+num+":"+num+":"+num+":"+num) +
\
        oneOf("+ -") + num
date.setParseAction(keepOriginalText)
uuid = delimitedList(Word(hexnums),"-",combine=True)

ipAddr = delimitedList(Word(nums),".",combine=True)
keyExpr=Word(alphas.upper())
valExpr=CharsNotIn(',')
qs1Expr = ipAddr + Group(delimitedList(Combine(keyExpr + '=' +
valExpr)))
def parseQS1(t):
    return qs1Expr.parseString(t[0])
def parseQS2(t):
    return t[0].split()

qs1 = quotedString.copy().setParseAction(removeQuotes, parseQS1)
qs2 = quotedString.copy().setParseAction(removeQuotes, parseQS2)

logString = Word(alphas,alphanums) + Word(alphas,alphanums) + \
    LBRACK + date + RBRACK + qs1 + qs2 + \
    LBRACK + uuid + RBRACK + LBRACK + Word(nums) + RBRACK

from pprint import pprint
pprint(logString.parseString(data).asList())

Prints:
['AzAccept',
 'PLYSSTM01',
 '23/Sep/2005:16:14:28 -0500',
 '162.44.245.32',
 ['CN=dddd cojack (890)',
  'OU=1',
  'OU=Customers',
  'OU=ISM-Users',
  'OU=kkk Secure',
  'DC=customer',
  'DC=rxcorp',
  'DC=com'],
 'plysmhc03zp',
 'GET',
 '/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc',
 'd4b62ca2-09a0-4334622b-0e1c-03c42ba5',
 '0']

Find more about pyparsing at http://pyparsing.wikispaces.com.

-- Paul





More information about the Python-list mailing list