[Python Wpg] remove dup mails
Stuart Williams
stuartw at mts.net
Wed Oct 31 22:09:50 EDT 2007
This looks great! I can't think of significantly better ways of doing
it. Here are some style suggestions.
Follow PEP 8 (http://www.python.org/dev/peps/pep-0008/) for
indentation, etc. Note that __delitem__ is a special name, intended
to implement dicts, so see how I used it below (untested). Also, file
is one of the objects that follows the new context management
protocol which allows you to use it with "with" a la
http://docs.python.org/whatsnew/pep-343.html and get rid of the try
and both closes. Lastly dict's support the "in" operator.
So here's a slightly different version:
#! /usr/bin/python
from __future__ import with_statement
import os
import sys
import email
import hashlib
dups = {}
for root, dirs, files in os.walk('a'):
for fname in files:
with open(os.path.join(root,fname)) as fobj:
msg = email.message_from_file(fobj)
del msg['Received']
hash = hashlib.md5(msg.as_string()).hexdigest()
if not hash in dups:
dups[hash] = os.path.join(root,fname)
else:
print 'unlink'
# os.unlink(os.path.join(root,fname))
On 10/31/07, Peter O'Gorman <peter at pogma.com> wrote:
> I mentioned at the meeting that fetchmail went mad and downloaded my
> mail messages repeatedly leaving me with multiple copies of several
> hundred messages. The files were not identical, but only differed in
> "Received" headers. This the the python script I came up with (it took a
> while, I had to spend a good deal of time reading the docs).
>
> I'm sure that there are better ways to do this, and would not mind a
> critique, but this did work.
>
> Thanks,
> Peter
>
> #! /usr/bin/python
> import os
> import sys
> import email
> import hashlib
>
> dups = {}
>
> for root, dirs, files in os.walk('/home/pogma/Maildir'):
> for fname in files:
> try:
> fobj = open(os.path.join(root,fname))
> msg = email.message_from_file(fobj)
> fobj.close()
> except:
> fobj.close()
> continue
> msg.__delitem__('Received')
> hash = hashlib.md5(msg.as_string()).hexdigest()
> if not dups.has_key(hash):
> dups[hash] = os.path.join(root,fname)
> else:
> os.unlink(os.path.join(root,fname))
> _______________________________________________
> Winnipeg mailing list
> Winnipeg at python.org
> http://mail.python.org/mailman/listinfo/winnipeg
>
More information about the Winnipeg
mailing list