[Spambayes] new option: generate_long_skips

Tim Peters tim.one@comcast.net
Mon, 30 Sep 2002 21:49:09 -0400


[Skip Montanaro]
> ...
> I notice it's suggesting an even lower cutoff now (0.375).
>
> Before:
>
>     -> best cutoff for all runs: 0.4
>     ->     with weighted total 1*30 fp + 17 fn = 47
>     ->     fp rate 1.5%  fn rate 0.85%
>
> After:
>
>     -> best cutoff for all runs: 0.375
>     ->     with weighted total 1*35 fp + 7 fn = 42
>     ->     fp rate 1.75%  fn rate 0.35%

It's suggesting that cutoff *if* what you want to do is minimize the total
number of misclassified messages, without favoring errors of either kind.
Most people here hate false positives more, and in that case you should set
option best_cutoff_fp_weight (which defaults to 1) to how much more you hate
fp than fn.  See the comments for that option in Options.py.

You have such extreme overlap that you should also boost nbuckets up from
its default 40; the resolution of the automated histogram analysis is
limited by the number of buckets.