[Spambayes] First result from Gary Robinson's ideas

Neale Pickett neale@woozle.org
18 Sep 2002 10:43:32 -0700


So then, Tim Peters <tim.one@comcast.net> is all like:

> If you want to try it (and I sure hope someone does on a smaller test
> corpus!), you need to change two places:

Hey, that's me for sure!

Here ya go:

"""
run1s -> run2s
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams

false positive percentages
    2.500  0.000  won   -100.00%
    2.500  0.000  won   -100.00%
    2.000  0.000  won   -100.00%
    2.000  0.000  won   -100.00%
    1.000  0.000  won   -100.00%

won   5 times
tied  0 times
lost  0 times

total unique fp went from 20 to 0 won   -100.00%
mean fp % went from 2.0 to 0.0 won   -100.00%

false negative percentages
    1.500  100.000  lost  +6566.67%
    1.000  100.000  lost  +9900.00%
    1.500  100.000  lost  +6566.67%
    1.000  100.000  lost  +9900.00%
    1.000  100.000  lost  +9900.00%

won   0 times
tied  0 times
lost  5 times

total unique fn went from 12 to 1000 lost  +8233.33%
mean fn % went from 1.2 to 100.0 lost  +8233.33%
"""

Those are spectacular false negative percentages!  But that's what I
should expect with this change, right?

Here's the breakdown:

"""
Ham distribution for all runs:
* = 17 items
  0.00 977 **********************************************************
  2.50   1 *
  5.00   0 
  7.50   0 
 10.00   0 
 12.50   0 
 15.00   0 
 17.50   0 
 20.00   0 
 22.50   0 
 25.00   0 
 27.50   0 
 30.00   0 
 32.50   0 
 35.00   0 
 37.50   0 
 40.00   0 
 42.50   0 
 45.00   0 
 47.50   0 
 50.00   0 
 52.50   0 
 55.00   0 
 57.50   0 
 60.00   0 
 62.50   0 
 65.00   0 
 67.50   0 
 70.00   0 
 72.50   0 
 75.00   0 
 77.50   1 *
 80.00   0 
 82.50   0 
 85.00   1 *
 87.50   0 
 90.00   1 *
 92.50   0 
 95.00   1 *
 97.50  18 **

Spam distribution for all runs:
* = 17 items
  0.00   8 *
  2.50   0 
  5.00   0 
  7.50   1 *
 10.00   1 *
 12.50   0 
 15.00   0 
 17.50   0 
 20.00   0 
 22.50   0 
 25.00   0 
 27.50   0 
 30.00   0 
 32.50   0 
 35.00   0 
 37.50   0 
 40.00   0 
 42.50   0 
 45.00   1 *
 47.50   0 
 50.00   1 *
 52.50   0 
 55.00   0 
 57.50   0 
 60.00   0 
 62.50   0 
 65.00   0 
 67.50   0 
 70.00   0 
 72.50   0 
 75.00   0 
 77.50   0 
 80.00   0 
 82.50   0 
 85.00   0 
 87.50   0 
 90.00   0 
 92.50   0 
 95.00   0 
 97.50 988 ***********************************************************
"""

Still pretty black and white, but the gray area does appear to have a
few more shades.

HTH

Neale