Help on Email Parsing

Jeremy Sanders jeremy+plusnews at jeremysanders.net
Mon Feb 23 05:06:39 EST 2004


On Mon, 23 Feb 2004 00:47:17 -0800, dont bother wrote:

>> I have been trying to parse emails:
> But I could not find any examples or snippets of parsing emails in
> python from the documentation.

Here is  a simple program (a bit of a hack) I wrote to count the number of
messages in a mailbox in each day (used for counting spams). It may be of
some use to you, although I don't actually parse the message itself, and
only the headers.

Jeremy

# Released under the GPL (version 2 or greater)
# Copyright (C) 2003 Jeremy Sanders

import mailbox
import string
import email
import email.Utils
import time
import sys

# open passed mailbox filename
# (yes - we need checking of this)
fp = open(sys.argv[1], 'r')

# open mailbox from file
mbox = mailbox.PortableUnixMailbox(fp)

secsinday = 86400
counts = {}

# get current time
nowtime = time.time()

# iterate over mail messages
while 1:
    # get next message
    msg = mbox.next()
    # exit if we've looked at the last one
    if msg == None:
        break

    # get received header
    received = msg.get('received')
    # skip messages with no received header
    if received == None:
        continue

    # get unix time of email
    date_rfind = string.rfind(received, ';')
    date = received[date_rfind+1:]
    pd = email.Utils.parsedate( string.strip(date) )

    # skip messages we can't parse the date on
    if pd == None:
        continue

    # get time between now and received date in message
    unixtime = time.mktime(pd)
    day = int( (unixtime-nowtime) / secsinday)

    # increment counter for day
    # (using a dict allows us to parse the messages only once)
    if not day in counts:
        counts[day] = 0
    counts[day] += 1

# sort days into numerical order
daylist = counts.keys()
daylist.sort()

# print out counts
for d in daylist:
    print d, counts[d]





More information about the Python-list mailing list