December 2002 comp.lang.* stats

Skip Montanaro skip at pobox.com
Sat Jan 25 11:53:42 EST 2003


    Aaron> Spam....hmmm.....don't know what kind of mechanism I could set up
    Aaron> that would filter that easily, or without me blowing the script
    Aaron> size to larger than I would want it to be....if you want to take
    Aaron> over, I'd pass you the code!

Download spambayes (http://spambayes.sf.net/), install it and train it on a
representative sample of ham and spam you find in the candidate groups
(100-200 messages should be sufficient), then run your counter script, and
ask spambayes to score each message with a tight ham_cutoff (0.1) and a
reasonable spam_cutoff (0.8).  Ignore any message classified as spam and
save any message classified as unsure.  Check the unsures.  If there are too
many mistakes there, train on them and/or adjust your spam/ham cutoff values
to better reflect the nature of the messages you get.

Skip










More information about the Python-list mailing list