[spambayes-dev] A spectacular false positive

Sat Nov 15 18:09:13 EST 2003

Skip Montanaro wrote:
>     Rob> I am now training on all mistakes and unsures, plus all ham scoring
>     Rob> more than 0.02 and all spam scoring less than 0.99. 
> 
> I used to use that sort of scheme as well, but it gets tedious after awhile
> and just grows my training database. 
[...]
> Also, when you get two of essentially the same spam, do you train on both?
> I'm trying to be careful now to minimize that sort of duplication.  I have
> so many email addresses feeding into skip at mojam.com that I generally get
> multiples of everything.

I do not get a lot of true duplicates, definitely not in the non-obvious 
spam.

This is my .procmailrc; it indeed has the copy-rule you mention.

LOGFILE=/home/h/hooft/procmail.log
:0 fw:hamlock
| /home/h/hooft/bin/sb_filter.py

# Messages that are so obviously spam that we should not train on them
:0
* ^X-SpamBayes-Classification: spam; 1.00
.ztrain.obvious-spam/

# Messages that are spam but we might want to train on them
:0
* ^X-SpamBayes-Classification: spam
.ztrain.spam/

# Unsure messages must be copied to the unsure folder for training
:0 c
* ^X-SpamBayes-Classification: unsure
.ztrain.unsure/

# Ham that doesn't score 0.00 is eligible for training as well
:0 c
* ^X-SpamBayes-Classification: ham; 0.0[2-9]
.ztrain.ham/

:0 c
* ^X-SpamBayes-Classification: ham; 0.1[0-9]
.ztrain.ham/

##
##
## Split into folders
##
##
:0
* ^List-Id:.*python-announce-list
.python.Announce/

## Etc.

-- 
Rob W.W. Hooft  ||  rob at hooft.net  ||  http://www.hooft.net/people/rob/