[Spambayes] Question about training via the web interface

Kenny Pitt kennypitt at hotmail.com
Thu Apr 15 10:33:56 EDT 2004


Katz, Amir wrote:
> In the Review Messages page, I changed the default for Ham and Spam to
> 'discard' and now the training is also much faster.

The danger is that you might discard a mistake without noticing it.
Personally, I keep the defaults as Defer and use the Ham and Spam
Discard Levels.  I have Ham Discard Level set to 0.001 and Spam Discard
Level set to 99.99.  That way only the messages that had near perfect
classification scores default to discard.

> IMHO, this behavior should be the default. As I suspected, and Tony
> explained, there is no advantage in training on messages that were
> identified correctly.

I don't believe Tony meant to imply that there is *no* advantage to
training on correct messages.  Nobody has proved a "perfect" training
strategy one way or another.  Training only on mistakes and unsures
works well for most people (including myself), but I find that if that's
all I do then my training gets out of balance.  My ham messages are much
easier for SpamBayes to identify because they are more homogeneous, so
most of my mistakes (very rare) and unsures are spam.  Occassionally, I
need to go back and train on some of the properly identified ham
messages in order to keep things balanced.

-- 
Kenny Pitt




More information about the Spambayes mailing list