[Python Wpg] remove dup mails
Stuart Williams
stuartw at mts.net
Wed Oct 31 22:16:58 EDT 2007
One more thought, not Python-related. Wouldn't msg['Message-id']
reliably replace the hash as a unique handle on the message?
On 10/31/07, Stuart Williams <stuartw at mts.net> wrote:
> This looks great! I can't think of significantly better ways of doing
> it. Here are some style suggestions.
>
> Follow PEP 8 (http://www.python.org/dev/peps/pep-0008/) for
> indentation, etc. Note that __delitem__ is a special name, intended
> to implement dicts, so see how I used it below (untested). Also, file
> is one of the objects that follows the new context management
> protocol which allows you to use it with "with" a la
> http://docs.python.org/whatsnew/pep-343.html and get rid of the try
> and both closes. Lastly dict's support the "in" operator.
>
> So here's a slightly different version:
>
> #! /usr/bin/python
> from __future__ import with_statement
>
> import os
> import sys
> import email
> import hashlib
>
> dups = {}
>
> for root, dirs, files in os.walk('a'):
> for fname in files:
> with open(os.path.join(root,fname)) as fobj:
> msg = email.message_from_file(fobj)
> del msg['Received']
> hash = hashlib.md5(msg.as_string()).hexdigest()
> if not hash in dups:
> dups[hash] = os.path.join(root,fname)
> else:
> print 'unlink'
> # os.unlink(os.path.join(root,fname))
>
>
> On 10/31/07, Peter O'Gorman <peter at pogma.com> wrote:
> > I mentioned at the meeting that fetchmail went mad and downloaded my
> > mail messages repeatedly leaving me with multiple copies of several
> > hundred messages. The files were not identical, but only differed in
> > "Received" headers. This the the python script I came up with (it took a
> > while, I had to spend a good deal of time reading the docs).
> >
> > I'm sure that there are better ways to do this, and would not mind a
> > critique, but this did work.
> >
> > Thanks,
> > Peter
> >
> > #! /usr/bin/python
> > import os
> > import sys
> > import email
> > import hashlib
> >
> > dups = {}
> >
> > for root, dirs, files in os.walk('/home/pogma/Maildir'):
> > for fname in files:
> > try:
> > fobj = open(os.path.join(root,fname))
> > msg = email.message_from_file(fobj)
> > fobj.close()
> > except:
> > fobj.close()
> > continue
> > msg.__delitem__('Received')
> > hash = hashlib.md5(msg.as_string()).hexdigest()
> > if not dups.has_key(hash):
> > dups[hash] = os.path.join(root,fname)
> > else:
> > os.unlink(os.path.join(root,fname))
> > _______________________________________________
> > Winnipeg mailing list
> > Winnipeg at python.org
> > http://mail.python.org/mailman/listinfo/winnipeg
> >
>
More information about the Winnipeg
mailing list