[Mailman-Developers] mm2.1 - DEFAULT_PLAIN_DIGEST_KEEP_HEADERS

Wed Jan 15 13:37:56 EST 2003

Quoth Phil Barnett <philb at philb.us>:
| On Tuesday 14 January 2003 4:25 pm, Barry A. Warsaw wrote:
...
|> BTW, if anybody else is going to be at the Spam conference, let me
|> know.  I'd love to have a chat about Mailman, spam and other stuffis.
|> I'll be getting in Thursday night.
|
| I'd love to see mailman get some spam recognition and some thresholds that we
| can set to deliberately get rid of the most obvious stuff.

The statistical approach sounds very good to me for lists - I guess the
spambayes project that some notable Python people have been working with
falls in that category, I've been using something called "spamoracle", etc.

In this case, though, before you start thinking about thresholds, you have
to train the classifier for your list's traffic - furnish it with examples
of good and bad mail.  After a couple of batches, it's surprisingly accurate
with my own mail, and since lists should normally follow a narrower range
of subject matter it ought to do even better there.

I haven't thought of a real good way to set up the training, though.  You
want to be able to select particular messages from the archives and feed
them into the classifier intact (headers and everything.)  That sounds
like writing a new archive index from the ground up (which might be a
good thing anyway, but I hope I'm wrong and that isn't necessary!)

	Donn Cave, donn at u.washington.edu