Logfile analysing with pyparsing

Paul McGuire ptmcg at austin.rr._bogus_.com
Tue Sep 26 14:09:23 EDT 2006


"Andi Clemens" <andi.clemens at gmx.net> wrote in message 
news:efadbv$gq7$1 at online.de...
> Hi,
>
> we had some problems in the last weeks with our mailserver.
> Some messages were not delivered and we wanted to know why.
> But looking through the logfile is a time consuming process.
> So I wanted to write a parser to analyse the logs and parse them as XML.
>
<snip>

Andi -

Well, pyparsing does have *some* XML connection, but I don't think it will 
be as direct as you might like.  I have attached below a pyparsing program 
that will probably parse 90% of your log messages, and give you some pretty 
easy-to-access data fields which you can then use to create your own Python 
data structures, such as dict keyed by queue id, dict keyed by message-id, 
etc., and then navigate through them to generate your XML.

-- Paul

logdata = """\
Sep 18 04:15:22 mailrelay postfix/cleanup[12103]: 755387301: 
message-id=<200609180214.k8I2EuNo016264 at mforward2.dtag.de>
Sep 18 04:15:22 mailrelay spamd[1364]: spamd: processing message 
<200609180214.k8I2EuNo016264 at mforward2.dtag.de> for nobody:65534
Sep 18 04:15:25 mailrelay spamd[1364]: spamd: result: Y 15 - 
BAYES_99,DATE_IN_PAST_03_06,DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_DSN,DNS_FROM_RFC_POST,DNS_FROM_RFC_WHOIS,FORGED_MUA_OUTLOOK,SPF_SOFTFAIL 
scantime=3.1,size=8086,user=nobody,uid=65534,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=55277,mid=<200609180214.k8I2EuNo016264 at mforward2.dtag.de>,bayes=1,autolearn=no
Sep 18 04:15:25 mailrelay postfix/cleanup[12074]: DA1431965E: 
message-id=<200609180214.k8I2EuNo016264 at mforward2.dtag.de>
Sep 18 04:15:26 mailrelay postfix/cleanup[13057]: EF90720AD: 
message-id=<200609180214.k8I2EuNo016264 at mforward2.dtag.de>
Sep 18 04:15:26 mailrelay postfix/smtp[10879]: EF90720AD: 
to=<SPAM-FOUND at OUR-MAILSERVER.mail.com>, relay=10.49.0.7[10.49.0.7], 
delay=1, status=sent (250 2.6.0 
<200609180214.k8I2EuNo016264 at mforward2.dtag.de> Queued mail for delivery)

Sep 18 02:15:11 mailrelay postfix/smtpd[10841]: 755387301: 
client=unknown[194.25.242.123]
Sep 18 04:15:22 mailrelay postfix/cleanup[12103]: 755387301: 
message-id=<200609180214.k8I2EuNo016264 at mforward2.dtag.de>
Sep 18 04:15:22 mailrelay postfix/qmgr[11082]: 755387301: 
from=<sender at mail.net.mx>, size=8152, nrcpt=7 (queue active)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301: 
to=<receiver1 at mail.com>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301: 
to=<receiver2 at mail.com>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301: 
to=<receiver3 at mail.com>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301: 
to=<receiver4 at mail.com>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/qmgr[11082]: 755387301: removed

Sep 18 04:15:25 mailrelay postfix/pickup[13175]: DA1431965E: uid=65534 
from=<nobody>
Sep 18 04:15:25 mailrelay postfix/cleanup[12074]: DA1431965E: 
message-id=<200609180214.k8I2EuNo016264 at mforward2.dtag.de>
Sep 18 04:15:25 mailrelay postfix/qmgr[11082]: DA1431965E: 
from=<nobody at OUR-MAILSERVER.mail.com>, size=11074, nrcpt=1 (queue active)
Sep 18 04:15:26 mailrelay postfix/smtp[11703]: DA1431965E: 
to=<SPAM-FOUND at OUR-MAILSERVER.mail.com>, relay=localhost[127.0.0.1], 
delay=1, status=sent (250 Ok: queued as EF90720AD)
Sep 18 04:15:26 mailrelay postfix/qmgr[11082]: DA1431965E: removed

Sep 18 04:15:25 mailrelay postfix/smtpd[11704]: EF90720AD: 
client=localhost[127.0.0.1]
Sep 18 04:15:26 mailrelay postfix/cleanup[13057]: EF90720AD: 
message-id=<200609180214.k8I2EuNo016264 at mforward2.dtag.de>
Sep 18 04:15:26 mailrelay postfix/smtp[11703]: DA1431965E: 
to=<SPAM-FOUND at OUR-MAILSERVER.mail.com>, relay=localhost[127.0.0.1], 
delay=1, status=sent (250 Ok: queued as EF90720AD)
Sep 18 04:15:26 mailrelay postfix/qmgr[11082]: EF90720AD: 
from=<nobody at OUR-MAILSERVER.mail.com>, size=11263, nrcpt=1 (queue active)
Sep 18 04:15:26 mailrelay postfix/smtp[10879]: EF90720AD: 
to=<SPAM-FOUND at OUR-MAILSERVER.mail.com>, relay=10.49.0.7[10.49.0.7], 
delay=1, status=sent (250 2.6.0 
<200609180214.k8I2EuNo016264 at mforward2.dtag.de> Queued mail for delivery)
Sep 18 04:15:26 mailrelay postfix/qmgr[11082]: EF90720AD: removed
""".split('\n')

from pyparsing import *

month = oneOf("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec")
dayOfMonth = Word(nums,max=2)
timeOfDay = Combine(Word(nums,exact=2)+":"+
                Word(nums,exact=2)+":"+Word(nums,exact=2))
timeStamp = month + dayOfMonth + timeOfDay
# may need to expand this if log contains other entries in this field
source = Literal("mailrelay")
emailAddr = QuotedString("<",endQuoteChar=">")
ipAddr = Combine(Word(nums)+"."+Word(nums)+"."+\
                Word(nums)+"."+Word(nums))
ipRef = ( "localhost" | ipAddr ) + "[" + ipAddr + "]"

command = Combine(Word(alphas) + Optional("/" + Word(alphas)))
pid = "[" + Word(nums) + "]"
queueId = Word(hexnums)
integer = Word(nums)
msgValue = ( integer | emailAddr | ipRef | Word(alphas) ) + \
            Optional( QuotedString("(",endQuoteChar=")") )
nvList = Dict(delimitedList( Group( Word(alphas+"-") +
                                        Suppress("=") + msgValue ) ))
msgBody = "removed" | nvList
spamdMsg = "spamd:" + restOfLine
regularMsg = queueId.setResultsName("queueId") + ":" + \
            msgBody.setResultsName("body")
logMessage = timeStamp + source + command.setResultsName("command") +\
            pid.setResultsName("pid") + ":" + (spamdMsg | regularMsg)

# parse each line in log
for log in logdata:
    if log:
        results = logMessage.parseString(log)
        print results.dump()
        for fieldName in "message-id queueId from to".split():
            print fieldName,":",
            try:
                print results[fieldName]
            except KeyError,ke:
                print


Prints out (excerpt):
- body: ['message-id', '200609180214.k8I2EuNo016264 at mforward2.dtag.de']
- command: postfix/cleanup
- message-id: 200609180214.k8I2EuNo016264 at mforward2.dtag.de
- pid: ['[', '13057', ']']
- queueId: EF90720AD
['Sep', '18', '04:15:26', 'mailrelay', 'postfix/cleanup', '[', '13057', ']', 
':', 'EF90720AD', ':', ['message-id', 
'200609180214.k8I2EuNo016264 at mforward2.dtag.de']]
message-id : 200609180214.k8I2EuNo016264 at mforward2.dtag.de
queueId : EF90720AD
from :
to :
- body: ['to', 'SPAM-FOUND at OUR-MAILSERVER.mail.com']
- command: postfix/smtp
- pid: ['[', '10879', ']']
- queueId: EF90720AD
- relay: 10
- to: SPAM-FOUND at OUR-MAILSERVER.mail.com
['Sep', '18', '04:15:26', 'mailrelay', 'postfix/smtp', '[', '10879', ']', 
':', 'EF90720AD', ':', ['to', 'SPAM-FOUND at OUR-MAILSERVER.mail.com'], 
['relay', '10']]
message-id :
queueId : EF90720AD
from :
to : SPAM-FOUND at OUR-MAILSERVER.mail.com
- body: ['client', 'unknown']
- client: unknown
- command: postfix/smtpd
- pid: ['[', '10841', ']']
- queueId: 755387301
['Sep', '18', '02:15:11', 'mailrelay', 'postfix/smtpd', '[', '10841', ']', 
':', '755387301', ':', ['client', 'unknown']]
message-id :
queueId : 755387301
from :
to :
- body: ['message-id', '200609180214.k8I2EuNo016264 at mforward2.dtag.de']
- command: postfix/cleanup
- message-id: 200609180214.k8I2EuNo016264 at mforward2.dtag.de
- pid: ['[', '12103', ']']
- queueId: 755387301
['Sep', '18', '04:15:22', 'mailrelay', 'postfix/cleanup', '[', '12103', ']', 
':', '755387301', ':', ['message-id', 
'200609180214.k8I2EuNo016264 at mforward2.dtag.de']]
message-id : 200609180214.k8I2EuNo016264 at mforward2.dtag.de
queueId : 755387301
from :
to :
- body: ['from', 'sender at mail.net.mx']
- command: postfix/qmgr
- from: sender at mail.net.mx
- nrcpt: ['7', 'queue active']
- pid: ['[', '11082', ']']
- queueId: 755387301
- size: 8152
['Sep', '18', '04:15:22', 'mailrelay', 'postfix/qmgr', '[', '11082', ']', 
':', '755387301', ':', ['from', 'sender at mail.net.mx'], ['size', '8152'], 
['nrcpt', '7', 'queue active']]
message-id :
queueId : 755387301
from : sender at mail.net.mx
to : 





More information about the Python-list mailing list