[Spambayes] Many users on domain coming up as "possibly spam"
Kenny Pitt
kennypitt at hotmail.com
Wed Oct 13 22:01:47 CEST 2004
Your problem almost certainly lies here:
# ham trained on: 23319
# spam trained on: 370
Based on the imbalance in the number of messages that you have trained, a
single spam token will have approximately 63 times as much influence on the
overall score as a single ham token.
For best results, you should train on roughly equal numbers of spam and ham
messages. 5x to 10x is probably OK for most people, but 63x is definately
pushing the limits. Your best bet is probably to delete your training
database and start over from scratch. If you train only by using the
toolbar buttons when messages are misclassified instead of by training a
bunch of existing messages up front then you'll probably get better results.
--
Kenny Pitt
_____
From: spambayes-bounces at python.org [mailto:spambayes-bounces at python.org] On
Behalf Of Mark Vovchuk
Sent: Wednesday, October 13, 2004 3:18 PM
To: spambayes at python.org
Subject: [Spambayes] Many users on domain coming up as "possibly spam"
Including myself. Many people in my organization are coming up as either
spam or maybe spam. I have been trying out spambayes as a way to get off of
another product and this is the last hurdle that I cannot overcome. I have
them keep moving each other, and myself, out using the "recover" button but
to no avail. this is one of the clues messages that someone had on an email
I sent:
Combined Score: 69% (0.686078)
Internal ham score (*H*): 0.229281
Internal spam score (*S*): 0.601437
# ham trained on: 23319
# spam trained on: 370
17 Significant Tokens
token spamprob #ham #spam
'subject:odd' 0.155172 1 0
'url:105957' 0.155172 1 0
'url:indymedia' 0.155172 1 0
'url:sandiego' 0.155172 1 0
'from:none' 0.3267 1559 12
'to:addr:rob' 0.334402 753 6
'message-id:invalid' 0.37662 1565 15
'reply-to:none' 0.397052 22874 239
'header:To:1' 0.608344 14607 360
'url:shtml' 0.694677 55 2
'url:org' 0.709459 619 24
'to:2**0' 0.744606 7133 330
'to:no real name:2**0' 0.804451 3722 243
'proto:http' 0.825724 3963 298
'url:10' 0.850336 21 2
'url:2004' 0.858892 9 1
'url:en' 0.963873 2 5
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes/attachments/20041013/42982cb2/attachment.html
More information about the Spambayes
mailing list