[Spambayes] proposed changes to hammie & co.

Tim Stone - Four Stones Expressions tim@fourstonesExpressions.com
Fri Nov 22 16:42:03 2002


Well, I've gone and done it... I've touched classifier code.  Either my name 
is now mud, or I really am a part of the community... lol

I added result cacheing to the _update_probability method in WordInfo (in 
hammie-playground branch).  I suspect that this will save a lot of time, maybe 
commensurate with what Adam Huff demonstrated.  I don't have a large enough 
corpus to really benchmark this, though, and you'll definitely want to take a 
good look to make sure I haven't goofed anything up.  I certainly didn't 
change any calculations...

On a related note... There ought to be some safeguard against division by zero  
in the hamratio and spamratio calculations.  The system shouldn't blow up with 
a /0 exception, but just peacefully assume some default and go about its 
business.  That's because it's possible that this could be run when only spam 
has been trained on (for example).  Some (regular everyday) user may very well 
make this mistake, which is most likely to occur immediately after 
installation.  A blow up this early will probably just result in them not 
using it, assuming that it doesn't work.  I'd have fixed it, but I have no 
idea what the peaceful default should be...   
  
- TimS
www.fourstonesExpressions.com 




More information about the Spambayes mailing list