[Spambayes] Back to language issue (long)

Tim Peters tim_one at email.msn.com
Sat Mar 29 21:46:34 EST 2003


[TimP]
> That won't work:  an unknown word has, as you say, spamprob 0.5
> by default, and all words with spamprob in (.4, .6) are simply
> ignored by default.

[TimS]
> That, I didn't know.  Learn something new all the time...

FYI, it's controlled by option minimum_prob_strength.  You can arrange to
ignore nothing by setting that to 0.0 (the default is 0.1), or to ignore
everything by setting it to 0.5.  Almost all testing reports said 0.1 worked
better than 0.0; one report did a little better at 0.0, but, for the reason
you gave, a setting of 0.0 would leave an exploitable hole in the scoring.
As is, gibberish words have no effect on scoring, but do have a subtler
effect:  they bloat the database size.




More information about the Spambayes mailing list