[Python Wpg] remove dup mails

Stuart Williams stuartw at mts.net
Wed Oct 31 22:16:58 EDT 2007


One more thought, not Python-related.  Wouldn't msg['Message-id']
reliably replace the hash as a unique handle on the message?

On 10/31/07, Stuart Williams <stuartw at mts.net> wrote:
> This looks great!  I can't think of significantly better ways of doing
> it.  Here are some style suggestions.
>
> Follow PEP 8 (http://www.python.org/dev/peps/pep-0008/) for
> indentation, etc.  Note that __delitem__ is a special name, intended
> to implement dicts, so see how I used it below (untested).  Also, file
> is one of the  objects that follows the new context management
> protocol which allows you to use it with "with" a la
> http://docs.python.org/whatsnew/pep-343.html and get rid of the try
> and both closes.  Lastly dict's support the "in" operator.
>
> So here's a slightly different version:
>
> #! /usr/bin/python
> from __future__ import with_statement
>
> import os
> import sys
> import email
> import hashlib
>
> dups = {}
>
> for root, dirs, files in os.walk('a'):
>     for fname in files:
>         with open(os.path.join(root,fname)) as fobj:
>             msg = email.message_from_file(fobj)
>             del msg['Received']
>             hash = hashlib.md5(msg.as_string()).hexdigest()
>             if not hash in dups:
>                 dups[hash] = os.path.join(root,fname)
>             else:
>                 print 'unlink'
>                 # os.unlink(os.path.join(root,fname))
>
>
> On 10/31/07, Peter O'Gorman <peter at pogma.com> wrote:
> > I mentioned at the meeting that fetchmail went mad and downloaded my
> > mail messages repeatedly leaving me with multiple copies of several
> > hundred messages. The files were not identical, but only differed in
> > "Received" headers. This the the python script I came up with (it took a
> > while, I had to spend a good deal of time reading the docs).
> >
> > I'm sure that there are better ways to do this, and would not mind a
> > critique, but this did work.
> >
> > Thanks,
> > Peter
> >
> > #! /usr/bin/python
> > import os
> > import sys
> > import email
> > import hashlib
> >
> > dups = {}
> >
> > for root, dirs, files in os.walk('/home/pogma/Maildir'):
> >   for fname in files:
> >     try:
> >       fobj = open(os.path.join(root,fname))
> >       msg = email.message_from_file(fobj)
> >       fobj.close()
> >     except:
> >       fobj.close()
> >       continue
> >     msg.__delitem__('Received')
> >     hash = hashlib.md5(msg.as_string()).hexdigest()
> >     if not dups.has_key(hash):
> >       dups[hash] = os.path.join(root,fname)
> >     else:
> >       os.unlink(os.path.join(root,fname))
> > _______________________________________________
> > Winnipeg mailing list
> > Winnipeg at python.org
> > http://mail.python.org/mailman/listinfo/winnipeg
> >
>



More information about the Winnipeg mailing list