[Spambayes] Feature idea: Autobalancing ham/spam

Thomas Hruska thruska at cubiclesoft.com
Tue Nov 27 15:01:44 CET 2007


I've been thinking about how I'm going to balance my ham (10,641 
messages) and spam (60,230 messages).  What I plan on doing is 
discarding spam and then just train on ham until they are balanced.  It 
will take a while because the incoming ratio of ham to spam is fairly 
ridiculous.

While this approach will work, I'm thinking it would be nice for 
Spambayes to automatically balance itself when some configurable 
percentage is hit on either end of the spectrum so that I wouldn't have 
to worry about it.  There will ALWAYS be more spam than ham.  Most users 
of Spambayes think like me:  Continue training on the spam in the hope 
that it will completely go away.  Why concern users with balance issues 
that should be, IMO, handled automatically?

Another option could be to calculate the ratio of ham to spam and alter 
the "strength" of the ham/spam clues according to the ratio.  However, 
this is probably a bad idea.

I'm running Spambayes 1.0.4.

-- 
Thomas Hruska
CubicleSoft President
Ph: 517-803-4197

*NEW* MyTaskFocus 1.1
Get on task.  Stay on task.

http://www.CubicleSoft.com/MyTaskFocus/



More information about the SpamBayes mailing list