[spambayes-dev] A spectacular false positive
Rob Hooft
rob at hooft.net
Sat Nov 15 18:09:13 EST 2003
Skip Montanaro wrote:
> Rob> I am now training on all mistakes and unsures, plus all ham scoring
> Rob> more than 0.02 and all spam scoring less than 0.99.
>
> I used to use that sort of scheme as well, but it gets tedious after awhile
> and just grows my training database.
[...]
> Also, when you get two of essentially the same spam, do you train on both?
> I'm trying to be careful now to minimize that sort of duplication. I have
> so many email addresses feeding into skip at mojam.com that I generally get
> multiples of everything.
I do not get a lot of true duplicates, definitely not in the non-obvious
spam.
This is my .procmailrc; it indeed has the copy-rule you mention.
LOGFILE=/home/h/hooft/procmail.log
:0 fw:hamlock
| /home/h/hooft/bin/sb_filter.py
# Messages that are so obviously spam that we should not train on them
:0
* ^X-SpamBayes-Classification: spam; 1.00
.ztrain.obvious-spam/
# Messages that are spam but we might want to train on them
:0
* ^X-SpamBayes-Classification: spam
.ztrain.spam/
# Unsure messages must be copied to the unsure folder for training
:0 c
* ^X-SpamBayes-Classification: unsure
.ztrain.unsure/
# Ham that doesn't score 0.00 is eligible for training as well
:0 c
* ^X-SpamBayes-Classification: ham; 0.0[2-9]
.ztrain.ham/
:0 c
* ^X-SpamBayes-Classification: ham; 0.1[0-9]
.ztrain.ham/
##
##
## Split into folders
##
##
:0
* ^List-Id:.*python-announce-list
.python.Announce/
## Etc.
--
Rob W.W. Hooft || rob at hooft.net || http://www.hooft.net/people/rob/
More information about the spambayes-dev
mailing list