[Spambayes] Outlook plugin - training

Rob W.W. Hooft rob@hooft.net
Wed Nov 13 07:51:14 2002


Tim Peters wrote:
> Now for another extreme:  after 10 startup msgs, the system trains itself on
> its own decisions, except that:
> 
> 1. Unsures are correctly classified by the user.
> 2. False negatives are correctly classified by the user.
> 
> But false positives are trained on *as spam*, assuming the user never looks
> at their spam folder.  That takes a long time to run, because
> update_probabilities() is called after every msg.  After 2,100 msgs,
> 
>  2100 trained:1181H+919S wrds:59659 fp:0 fn:0 unsure:26
> 
> and the unsures are growing very slowly now (at 1400 msgs there were 25
> unsures).

Now THIS is the way I'd like to go! I think this is approximately the 
minimum effort we can expect from lazy users (like myself). Sometimes, a 
fp might actually be corrected by the user at some point, but testing it 
the way you did should be giving the minimal possible performance of a 
minimal-impact system that would not require much training to begin with.

There is one catch: what if the first 10 messages are all ham or all 
spam? Shouldn't we require at least a few of each?

How would this work to start on a mailing list? I guess we could deliver
spambayes with 5 "representative recent spam" (or a URL where they can 
be found). The mailing list would moderate the first few messages to the 
list, and then the filter will kick in. If a message is "spam", it can 
be returned to the sender, saying that the message has been judged 
inappropriate by the filter based on wording. "ham" can be posted 
without moderator approval. And all "unsure" messages are held for 
approval. The approval interface could have a separate "Spam" 
classification, but that is not really necessary: anything 
"inappropriate" can go in the spam corpus. For "fn"s, the archives 
should have the options to delete a message as spam.

For now my MUA is so badly integrated that I have yet to train a second 
time....

Rob

-- 
Rob W.W. Hooft  ||  rob@hooft.net  ||  http://www.hooft.net/people/rob/




More information about the Spambayes mailing list