Whitelist/verification spam filters

Thu Aug 29 15:11:44 EDT 2002

In article <mailman.1030555727.11821.python-list at python.org>,
David Mertz, Ph.D. <mertz at gnosis.cx> wrote:
>-$P-W$- at verence.demon.co.uk (Paul Wright) wrote:
>
>|Are you aware of the Distributed Checksum Clearinghouse (DCC)? That
>|seems to be a good way of dealing with spam, to my mind.
>
>I sent off a draft, but did not reference DCC.  Perhaps I'll try to add
>that before publication.  But I talked about Pyzor/Razor, and the
>general principle of distributed blacklists.  Pyzor/Razor, btw. use a
>statistical fuzzy digest in cataloging messages.  I guess an individual
>message is diagnosed probabilistically as matching any cataloged spam.
>
>I didn't look at the underlying algorithmic details, but I trust them
>here.  I found zero false positives with Pyzor... but I got a very high
>rate of false negatives on my spam corpus.

The thing I like about the DCC as opposed to Pyzor/Razor is that it does
not rely on humans reporting spam. Since it stores hashes of all
non-local mail passing through a server, any hash with a sufficiently
high count is either a mailing list or spam (hence my comment about
needing to whitelist legitimate bulk email). I hear this works quite
well, although my own spam load is small enough that I haven't bothered
to set it up here.

-- 
Paul Wright | http://pobox.com/~pw201 |