[Spambayes] Problem with POP3 Proxy: Complains about ham/spam ratio

kerhop at oz.net kerhop at oz.net
Mon Aug 9 07:13:29 CEST 2004


A non-text attachment was scrubbed...
Name: SpamBayesServer1.log
Type: application/octet-stream
Size: 724 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20040809/501f9196/SpamBayesServer1.obj
-------------- next part --------------
I am using SpamBayes POP3 Proxy Version 1.0rc2 (June 2004) (binary),
with version 2.3.3 (#51, Feb 13 2004, 14:39:56) [MSC v.1200 32 bit
(Intel)] of Python; my operating system is Windows 5.1.2600.2 (Service
Pack 1).  I have trained 984 ham and 3201 spam.
 
The problem I am having is if I train too many messages as spam it
starts complaining about the ratio. Now I know that FAQ 4.9 advises
not having a ratio of 2:1 however based on it's own statistics around
20% of my mail is ham and the rest is spam which is about on par with
what is reporting in the news (previous news reports said it was 60%
of all mail, now the reported average is 85%). I've trained a total of
around 12000 emails reviewing both the spam and ham and its dead on
but to prevent the warning about the ratio I've been just discarding
the spam/unsure and only training on what it thinks is ham. Is the 2:1
ratio in the FAQ just a recommendation or is there a programic reason
not to exceed it? If it's just a recommendation I'd like to see a
future version allow the user to specify their own custom ratio
perhaps with the warning indicating that one has exceeded their ratio
and may want to carefully review spam/ham in the future to ensure they
are training the filter correctly.


More information about the Spambayes mailing list