[Spambayes] Outlook plugin - training
Rob W.W. Hooft
rob@hooft.net
Wed Nov 13 07:51:14 2002
Tim Peters wrote:
> Now for another extreme: after 10 startup msgs, the system trains itself on
> its own decisions, except that:
>
> 1. Unsures are correctly classified by the user.
> 2. False negatives are correctly classified by the user.
>
> But false positives are trained on *as spam*, assuming the user never looks
> at their spam folder. That takes a long time to run, because
> update_probabilities() is called after every msg. After 2,100 msgs,
>
> 2100 trained:1181H+919S wrds:59659 fp:0 fn:0 unsure:26
>
> and the unsures are growing very slowly now (at 1400 msgs there were 25
> unsures).
Now THIS is the way I'd like to go! I think this is approximately the
minimum effort we can expect from lazy users (like myself). Sometimes, a
fp might actually be corrected by the user at some point, but testing it
the way you did should be giving the minimal possible performance of a
minimal-impact system that would not require much training to begin with.
There is one catch: what if the first 10 messages are all ham or all
spam? Shouldn't we require at least a few of each?
How would this work to start on a mailing list? I guess we could deliver
spambayes with 5 "representative recent spam" (or a URL where they can
be found). The mailing list would moderate the first few messages to the
list, and then the filter will kick in. If a message is "spam", it can
be returned to the sender, saying that the message has been judged
inappropriate by the filter based on wording. "ham" can be posted
without moderator approval. And all "unsure" messages are held for
approval. The approval interface could have a separate "Spam"
classification, but that is not really necessary: anything
"inappropriate" can go in the spam corpus. For "fn"s, the archives
should have the options to delete a message as spam.
For now my MUA is so badly integrated that I have yet to train a second
time....
Rob
--
Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/
More information about the Spambayes
mailing list