[Python Wpg] remove dup mails

Stuart Williams stuartw at mts.net
Wed Oct 31 22:09:50 EDT 2007


This looks great!  I can't think of significantly better ways of doing
it.  Here are some style suggestions.

Follow PEP 8 (http://www.python.org/dev/peps/pep-0008/) for
indentation, etc.  Note that __delitem__ is a special name, intended
to implement dicts, so see how I used it below (untested).  Also, file
is one of the  objects that follows the new context management
protocol which allows you to use it with "with" a la
http://docs.python.org/whatsnew/pep-343.html and get rid of the try
and both closes.  Lastly dict's support the "in" operator.

So here's a slightly different version:

#! /usr/bin/python
from __future__ import with_statement

import os
import sys
import email
import hashlib

dups = {}

for root, dirs, files in os.walk('a'):
    for fname in files:
        with open(os.path.join(root,fname)) as fobj:
            msg = email.message_from_file(fobj)
            del msg['Received']
            hash = hashlib.md5(msg.as_string()).hexdigest()
            if not hash in dups:
                dups[hash] = os.path.join(root,fname)
            else:
                print 'unlink'
                # os.unlink(os.path.join(root,fname))


On 10/31/07, Peter O'Gorman <peter at pogma.com> wrote:
> I mentioned at the meeting that fetchmail went mad and downloaded my
> mail messages repeatedly leaving me with multiple copies of several
> hundred messages. The files were not identical, but only differed in
> "Received" headers. This the the python script I came up with (it took a
> while, I had to spend a good deal of time reading the docs).
>
> I'm sure that there are better ways to do this, and would not mind a
> critique, but this did work.
>
> Thanks,
> Peter
>
> #! /usr/bin/python
> import os
> import sys
> import email
> import hashlib
>
> dups = {}
>
> for root, dirs, files in os.walk('/home/pogma/Maildir'):
>   for fname in files:
>     try:
>       fobj = open(os.path.join(root,fname))
>       msg = email.message_from_file(fobj)
>       fobj.close()
>     except:
>       fobj.close()
>       continue
>     msg.__delitem__('Received')
>     hash = hashlib.md5(msg.as_string()).hexdigest()
>     if not dups.has_key(hash):
>       dups[hash] = os.path.join(root,fname)
>     else:
>       os.unlink(os.path.join(root,fname))
> _______________________________________________
> Winnipeg mailing list
> Winnipeg at python.org
> http://mail.python.org/mailman/listinfo/winnipeg
>



More information about the Winnipeg mailing list