[Spambayes] Junk suspects

Tony Meyer tameyer at ihug.co.nz
Thu Aug 19 03:32:13 CEST 2004

> Here's the results from my session so far today.  This was a 
> rather large email dload since it contained the entire 
> weekend.  My database has about 2600 ham and spam msg's each 
> in it.  I am continually have to mark some unsure msg's as 
> ham that are from the same person with the same subject some 
> of the time.  It's like it cant get it through it's head that 
> it's good. And it seems to constantly mark out of office 
> reply's as spam.
> Processed 2024 msg's
> 1201 Good
> 660 Spam
> 163 unsure
> 88 msg's manually classified as good with 0 being false 
> positives. 78 msg's were manually classified as spam with 3 
> being false.

This does seem a high proportion of unsures.  It's impossible to tell why
messages score what they do without the clues, though.

As a complete guess, have you tried on an out-of-office message a spam?
These sorts of messages often have very little information in them, and are
often extremely similar to each other (people using defaults, people in the
same organisation).  As such, if one was spam, the others will look like it
too, without any other training.

The way to see why a message is scoring what it is is to select it (before
training) and choose "Show spam clues for this message" from the SpamBayes
menu.  This will bring up a message with the clues list - you will probably
be able to see yourself why it scores what it does (which should hint at the
solution), but if you can't, you can send a copy to the list (with an
explanation) and we can try and explain it to you.

Again, without the clues, we're blind.

Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.

More information about the Spambayes mailing list