[Spambayes] Moving closer to Gary's ideal

Sjoerd Mullender sjoerd@acm.org
Mon, 23 Sep 2002 10:32:21 +0200


On Sat, Sep 21 2002 Tim Peters wrote:

> """
> [Classifier]
> use_robinson_probability: True
> use_robinson_combining: True
> max_discriminators: 1500
> 
> [TestDriver]
> spam_cutoff: 0.50
> """

I tested this against the default options (except I have
count_all_header_lines: True and mine_received_headers: True
permanently) and got these results:

false positive percentages
    0.524  1.047  lost   +99.81%
    0.000  0.524  lost  +(was 0)
    0.524  0.524  tied
    0.524  1.047  lost   +99.81%
    0.524  1.571  lost  +199.81%

won   0 times
tied  1 times
lost  4 times

total unique fp went from 4 to 9 lost  +125.00%
mean fp % went from 0.418848167539 to 0.942408376964 lost  +125.00%

false negative percentages
    1.571  0.000  won   -100.00%
    2.618  2.094  won    -20.02%
    1.571  0.524  won    -66.65%
    0.524  0.524  tied
    1.571  1.047  won    -33.35%

won   4 times
tied  1 times
lost  0 times

total unique fn went from 15 to 8 won    -46.67%
mean fn % went from 1.57068062827 to 0.83769633508 won    -46.67%

The histograms in the default scheme show the usual pattern, but the
histograms with the changed parameters is like this:


Ham distribution for all runs:
955 items; mean 26.28; sample sdev 8.12
* = 3 items
  0.00   0 
  2.50   0 
  5.00   0 
  7.50  40 **************
 10.00   0 
 12.50  61 *********************
 15.00  27 *********
 17.50  47 ****************
 20.00  96 ********************************
 22.50 127 *******************************************
 25.00 155 ****************************************************
 27.50 127 *******************************************
 30.00  96 ********************************
 32.50  65 **********************
 35.00  44 ***************
 37.50  24 ********
 40.00  13 *****
 42.50  13 *****
 45.00   5 **
 47.50   6 **
 50.00   8 ***
 52.50   1 *
 55.00   0 

Spam distribution for all runs:
955 items; mean 68.60; sample sdev 8.43
* = 2 items
 32.50   0 
 35.00   1 *
 37.50   2 *
 40.00   0 
 42.50   0 
 45.00   3 **
 47.50   2 *
 50.00  10 *****
 52.50  15 ********
 55.00  31 ****************
 57.50  70 ***********************************
 60.00  93 ***********************************************
 62.50 117 ***********************************************************
 65.00 109 *******************************************************
 67.50 117 ***********************************************************
 70.00 103 ****************************************************
 72.50  58 *****************************
 75.00  78 ***************************************
 77.50  59 ******************************
 80.00  34 *****************
 82.50  24 ************
 85.00   8 ****
 87.50   6 ***
 90.00  15 ********
 92.50   0 
 95.00   0 
 97.50   0 

-- Sjoerd Mullender <sjoerd@acm.org>