[Spambayes] possible feature request: ham training from Unsure folder

Seth Goodman sethg at GoodmanAssociates.com
Tue Nov 4 15:10:08 EST 2003


The following is my guess at how the Unsure and Spam folders work, and if
this is correct, I have a related feature request.  As I am new to
SpamBayes, I welcome your corrections and explanations.

1) If the message spam score is less than the ham threshold, the message is
left in the watched folder.  No training is done with the message.  If the
user then highlights that message and hits the "Delete As Spam" button, the
message is moved to the Spam folder and it is trained on as spam.

2) If the message spam score is greater than the spam threshold, the message
is moved from the watched folder into the Spam folder.  No training is done
with the message.  If the user then highlights that message and hits
"Recover from spam" button, the message is moved back to its' original
watched folder (not necessarily the Inbox) and it is trained on as ham.

3) If the message spam score is between the ham and spam thresholds, the
message is moved from the watched folder to the Unsure folder.  No training
is done with the message.  If the user then highlights that message and hits
"Delete as spam" button, the message is moved to the Spam folder and it is
trained on as spam.  If the user manually moves the message to a non-Spam
folder, no training is done with the message.

If this is correct, I think it exposes a minor weakness.  This is based on
the premise that SpamBayes should only train on messages that the system
cannot already classify correctly.  This assumes that the reason for not
training on all messages is, as other folks have pointed out, that
classification accuracy suffers when the training corpus is too large
(around 10K messages).  If these assumptions are correct, then why not train
on *all* messages in the Unsure folder when the user manually classifies
them?  The resulting feature request would be that when the Unsure folder is
selected, SpamBayes should display *two* classification buttons:  "Delete As
Spam" and "Keep As Good".  The "Keep As Good" button would work exactly like
the "Recover from spam" button in the Spam folder, that is, move the message
back to its' original watched folder and train on the message as ham.

This would have two benefits, one definite and the other proposed.  The
definite benefit is that in the case that the message came from a watched
folder other than the Inbox, it could be returned to the proper folder
automatically.  The proposed second benefit is that since the message was
not definitively classified, training on it as ham would improve the future
classification accuracy of the system and result in fewer messages in the
Unsure folder.

Corrections, reactions and education are welcome.

Regards,

Seth Goodman

Goodman Associates, LLC




More information about the Spambayes mailing list