[Mailman-Users] Filtering out duplicate emails

Mark Sapiro mark at msapiro.net
Thu Feb 11 16:50:54 CET 2010


Chiang Wu wrote:

>Hi. I was wondering if anyone knows a way to filter out, yet still archieve
>new e-mails that have the same content as an already archived e-mail.


You would have to write a custom handler[1] to examine the content of
this mail and compare it to something and if it is a 'duplicate',
remove 'ToOutgoing' from the pipeline in this messages metadata.

Comparing it to the archive is problematic because archiving is
asynchronous with incoming message processing, and if two 'duplicates'
arrive close in time, the first may not be archived when you process
the second.

If you want to avoid truly identical content, this handler could keep a
small database of some hash of the content and its process time for
lookup as subsequent messages arrive. If the 'duplicate' content
differs in things like time stamps only, you could filter those before
hashing.



[1] <http://wiki.list.org/x/l4A9>

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list