[Spambayes] Many users on domain coming up as "possibly spam"

Kenny Pitt kennypitt at hotmail.com
Wed Oct 13 22:01:47 CEST 2004


Your problem almost certainly lies here:
 
# ham trained on: 23319
# spam trained on: 370

Based on the imbalance in the number of messages that you have trained, a
single spam token will have approximately 63 times as much influence on the
overall score as a single ham token.
 
For best results, you should train on roughly equal numbers of spam and ham
messages.  5x to 10x is probably OK for most people, but 63x is definately
pushing the limits.  Your best bet is probably to delete your training
database and start over from scratch.  If you train only by using the
toolbar buttons when messages are misclassified instead of by training a
bunch of existing messages up front then you'll probably get better results.
 
-- 
Kenny Pitt
 


  _____  

From: spambayes-bounces at python.org [mailto:spambayes-bounces at python.org] On
Behalf Of Mark Vovchuk
Sent: Wednesday, October 13, 2004 3:18 PM
To: spambayes at python.org
Subject: [Spambayes] Many users on domain coming up as "possibly spam"


Including myself.  Many people in my organization are coming up as either
spam or maybe spam.  I have been trying out spambayes as a way to get off of
another product and this is the last hurdle that I cannot overcome.  I have
them keep moving each other, and myself, out using the "recover" button but
to no avail.  this is one of the clues messages that someone had on an email
I sent:
 

Combined Score: 69% (0.686078)

Internal ham score (*H*): 0.229281
Internal spam score (*S*): 0.601437

# ham trained on: 23319
# spam trained on: 370


17 Significant Tokens

token                               spamprob         #ham  #spam

'subject:odd'                       0.155172            1      0

'url:105957'                        0.155172            1      0

'url:indymedia'                     0.155172            1      0

'url:sandiego'                      0.155172            1      0

'from:none'                         0.3267           1559     12

'to:addr:rob'                       0.334402          753      6

'message-id:invalid'                0.37662          1565     15

'reply-to:none'                     0.397052        22874    239

'header:To:1'                       0.608344        14607    360

'url:shtml'                         0.694677           55      2

'url:org'                           0.709459          619     24

'to:2**0'                           0.744606         7133    330

'to:no real name:2**0'              0.804451         3722    243

'proto:http'                        0.825724         3963    298

'url:10'                            0.850336           21      2

'url:2004'                          0.858892            9      1

'url:en'                            0.963873            2      5



 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes/attachments/20041013/42982cb2/attachment.html


More information about the Spambayes mailing list