[Spambayes-checkins] spambayes/utilities export_apple_mail.py, NONE, 1.1
Tony Meyer
anadelonbrin at users.sourceforge.net
Tue Jan 3 03:47:28 CET 2006
Update of /cvsroot/spambayes/spambayes/utilities
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19774/utilities
Added Files:
export_apple_mail.py
Log Message:
Simple utility to convert an Apple Mail 2.x user's ~/Library/Mail folder of .emlx
files to the standard spambayes testtools format.
(Original files are not altered).
--- NEW FILE: export_apple_mail.py ---
#!/usr/bin/env python
"""export_apple_mail.py
Converts Apple Mail's emlx files to plain text files usable
by SpamBayes's testtools.
Adding some way to display help would be good. For now, read
this file and run the script with the path to the user's
~/Library/Mail directory.
(Tested on Windows XP remotely accessing the Mac filesystem.
I don't know if the bundling of the files in the Mail directory
would effect this script or not, and can't be bothered finding
out right now).
"""
import os
import sys
from spambayes.Options import options
def emlx_to_rfc2822(in_fn, out_fn):
"""Convert an individual file in Apple Mail's emlx format
to a file with just the RFC2822 message.
The emlx format is simply the length of the message (as a
string) on the first line, then the raw message text, then
the contents of a plist (XML) file that contains data that
Mail uses (subject, flags, sender, and so forth). We ignore
this plist data).
"""
fin = file(in_fn)
fout = file(out_fn, "w")
length = int(fin.readline().rstrip())
fout.write(fin.read(length))
plist = fin.read()
def export(mail_dir):
"""Scans through the specified directory, which should be
the Apple Mail user's ~\Library\Mail folder, converting
all found emlx files to simple RFC2822 messages suitable
for use with the SpamBayes testtools.
Messages are copied (the originals are left untouched) into
the standard SpamBayes testtools setup (all files are put in the
reservoir; use rebal.py to distribute).
The script assumes that all messages outside of Mail's
Junk folder are ham, and all messages inside the Junk folder
are spam.
Any messages in the "Sent Messages" folders are skipped.
A simple extension of this function would allow only certain
accounts/mailboxes to be exported.
"""
for dirname in os.listdir(mail_dir):
# There is no mail at the top level.
dirname = os.path.join(mail_dir, dirname)
if os.path.isdir(dirname):
export_directory(mail_dir, dirname)
print
def export_directory(parent, dirname):
if parent == "Junk.mbox":
# All of these should be spam. Make sure that you
# check for false positives first!
dest_dir = os.path.join(\
os.path.dirname(options["TestDriver", "spam_directories"]),
"reservoir")
elif parent == "Sent Messages.mbox" or parent == "Drafts.mbox":
# We don't do anything with outgoing mail.
return
else:
# Everything else is ham.
dest_dir = os.path.join(\
os.path.dirname(options["TestDriver", "ham_directories"]),
"reservoir")
dest_dir = os.path.normpath(dest_dir)
for path in os.listdir(dirname):
path = os.path.join(dirname, path)
if os.path.isdir(path):
export_directory(dirname, path)
else:
fn, ext = os.path.splitext(path)
if ext == ".emlx":
in_fn = os.path.join(dirname, path)
out_fn = os.path.join(dest_dir,
os.path.basename(fn) + ".txt")
emlx_to_rfc2822(in_fn, out_fn)
sys.stdout.write('.')
if __name__ == "__main__":
export(sys.argv[1])
More information about the Spambayes-checkins
mailing list