Help on Email Parsing
Jeremy Sanders
jeremy+plusnews at jeremysanders.net
Mon Feb 23 05:06:39 EST 2004
On Mon, 23 Feb 2004 00:47:17 -0800, dont bother wrote:
>> I have been trying to parse emails:
> But I could not find any examples or snippets of parsing emails in
> python from the documentation.
Here is a simple program (a bit of a hack) I wrote to count the number of
messages in a mailbox in each day (used for counting spams). It may be of
some use to you, although I don't actually parse the message itself, and
only the headers.
Jeremy
# Released under the GPL (version 2 or greater)
# Copyright (C) 2003 Jeremy Sanders
import mailbox
import string
import email
import email.Utils
import time
import sys
# open passed mailbox filename
# (yes - we need checking of this)
fp = open(sys.argv[1], 'r')
# open mailbox from file
mbox = mailbox.PortableUnixMailbox(fp)
secsinday = 86400
counts = {}
# get current time
nowtime = time.time()
# iterate over mail messages
while 1:
# get next message
msg = mbox.next()
# exit if we've looked at the last one
if msg == None:
break
# get received header
received = msg.get('received')
# skip messages with no received header
if received == None:
continue
# get unix time of email
date_rfind = string.rfind(received, ';')
date = received[date_rfind+1:]
pd = email.Utils.parsedate( string.strip(date) )
# skip messages we can't parse the date on
if pd == None:
continue
# get time between now and received date in message
unixtime = time.mktime(pd)
day = int( (unixtime-nowtime) / secsinday)
# increment counter for day
# (using a dict allows us to parse the messages only once)
if not day in counts:
counts[day] = 0
counts[day] += 1
# sort days into numerical order
daylist = counts.keys()
daylist.sort()
# print out counts
for d in daylist:
print d, counts[d]
More information about the Python-list
mailing list