December 2002 comp.lang.* stats
Skip Montanaro
skip at pobox.com
Sat Jan 25 11:53:42 EST 2003
Aaron> Spam....hmmm.....don't know what kind of mechanism I could set up
Aaron> that would filter that easily, or without me blowing the script
Aaron> size to larger than I would want it to be....if you want to take
Aaron> over, I'd pass you the code!
Download spambayes (http://spambayes.sf.net/), install it and train it on a
representative sample of ham and spam you find in the candidate groups
(100-200 messages should be sufficient), then run your counter script, and
ask spambayes to score each message with a tight ham_cutoff (0.1) and a
reasonable spam_cutoff (0.8). Ignore any message classified as spam and
save any message classified as unsure. Check the unsures. If there are too
many mistakes there, train on them and/or adjust your spam/ham cutoff values
to better reflect the nature of the messages you get.
Skip
More information about the Python-list
mailing list