[Spambayes] CL2 results and CL3 results

Tim Peters tim.one@comcast.net
Sat, 05 Oct 2002 19:47:32 -0400


[Brad Clements]
> Uh, they're not 4-lines because my .ini settings aren't default..
>
> but, I've made them four lines now, snip snip.

Thanks!

> CL2 RESULTS
> ...
> -> <stat> Ham scores for all runs: 6500 items; mean 1.53; sdev 9.48
> -> <stat> min 0; median 0; max 100
> * = 104 items
>   0 6321 *************************************************************
>  48  106 **
>  50   52 *
>  98   21 *
>
> -> <stat> Spam scores for all runs: 6500 items; mean 99.17; sdev 6.93
> -> <stat> min 0; median 100; max 100
> * = 105 items
>   0   10 *
>  48   14 *
>  50   75 *
>  98 6401 *************************************************************

> CL3 RESULTS
> ...
> -> <stat> Ham scores for all runs: 6500 items; mean 1.11; sdev 8.23
> -> <stat> min 0; median 0; max 100
> * = 105 items
>   0 6373 *************************************************************
>  48   75 *
>  50   34 *
>  98   18 *
>
> -> <stat> Spam scores for all runs: 6500 items; mean 98.96; sdev 7.46
> -> <stat> min 0; median 100; max 100
> * = 105 items
>   0    7 *
>  48   30 *
>  50   92 *
>  98 6371 *************************************************************

Your test data looks tougher than mine, but three outcomes are the same:

1. CL3 is certain more often than CL2 about ham, and makes fewer
   mistakes when it is certain.

2. CL3 is certain less often than CL2 about spam, but makes fewer
   mistakes when it's certain there too.

3. CL2 and CL3 both have high error rates in their regions of
   uncertainty.  I think that's a Very Good Thing, because it
   means manual review won't be overwhelmingly a waste of time.
   If the error rate in the uncertainty region is just a percent
   or two, I believe manual review will become careless, or even
   skipped.  But if it's actually wrong in its guess a third of
   the time, it will be fun to remind yourself of how much smarter
   you are than a stupid computer <wink>.

If you've still got the clim pickles from these runs, please try Rob's
rmspik.py on them too (I just checked that into the project).