[Spambayes] More experiments with weaktest.py

Sun Nov 10 07:27:38 2002

[Rob Hooft]
> These were results of weaktest with default parameters:

Very interesting!  I'll have to try that too.  Note that in my live email
experiment here, I'm (except for the very start) also scoring/training msgs
in (with small lapses) the order they arrive.  It's been reported before
that this helps; although I still haven't run a controlled experiment on
that, my *impression* is that it does help.

>    Total messages 6540 (4800 ham and 1740 spam)
>    Total unsure (including 30 startup messages): 336 (5.1%)
>    Trained on 178 ham and 162 spam
>    fp: 2 fn: 2
>    Total cost: $89.20
>
> If I set the "ham_cutoff" to 10 from 20 to make things more symmetrical
> (spam_cutoff is 90 by default):

The asymmetry is intentional:  most people hate FP more than FN, so by
default I made it harder for a thing to get called spam.  In test after test
we've also seen that spam has a tighter score distribution than ham, which
is a more formal justification for setting the spam cutoff closer to its
endpoint than the ham cutoff.  Setting ham_cutoff as low as 10 is for the
truly paranoid <0.9 wink>.

>    Total messages 6540 (4800 ham and 1740 spam)
>    Total unsure (including 30 startup messages): 442 (6.8%)
>    Trained on 292 ham and 152 spam
>    fp: 2 fn: 0
>    Total cost: $108.40
>
> So the database grows by 30% but it didn't help my cost. The training
> set is now unbalanced 2:1. Set spam_cutoff to 80 and ham_cutoff back to
> the default 20:
>
>    Total messages 6540 (4800 ham and 1740 spam)
>    Total unsure (including 30 startup messages): 304 (4.6%)
>    Trained on 213 ham and 101 spam
>    fp: 7 fn: 3
>    Total cost: $133.80
>
> This reduces the database by only 10%, but at very high fp cost. Same
> 2:1 unbalance in the training set.
> Back to the default 20:90 then, and set the minimum_prob_strength to 0.0:
>
>    Total messages 6540 (4800 ham and 1740 spam)
>    Total unsure (including 30 startup messages): 933 (14.3%)
>    Trained on 497 ham and 437 spam
>    fp: 0 fn: 1
>    Total cost: $187.60
>
> OK, so that didn't work either. How about setting it to 0.2?
>
>    Total messages 6540 (4800 ham and 1740 spam)
>    Total unsure (including 30 startup messages): 304 (4.6%)
>    Trained on 134 ham and 177 spam
>    fp: 2 fn: 5
>    Total cost: $85.80
>
> Hm. That is slightly better. Funny, we are suddenly training on more
> spam than ham.... Back to 0.1 anyway ---the differences are too small---
> and set robinson_probability_x = 0.3 (default is 0.5):
>
>    Total messages 6540 (4800 ham and 1740 spam)
>    Total unsure (including 30 startup messages): 602 (9.2%)
>    Trained on 54 ham and 616 spam
>    fp: 1 fn: 67
>    Total cost: $197.40
>
> Very interesting: this changes the training ratio to 1:12, at huge cost!
> (less than one in three spams was recognized solidly as such).

Note that in calculations I reported a day or two ago, the measured mean of
spamprobs across 3 different corpora was > 0.5, but not by a lot.  .3 moves
it outside the range minimum_prob_strength ignores, so now every "new word"
is instantly taken as a ham clue, where before all new words were ignored by
default.  So that it grossly inflated the FN rate isn't surprising;
everything that will *eventually* become a hapax is initially taken to be a
ham clue, even when it's never been seen before.

> Wonder what this could do if changed together with the cutoff....
> Lets move it back to 0.5, and try "robinson_probability_s = 0.3":
>
>    Total messages 6540 (4800 ham and 1740 spam)
>    Total unsure (including 30 startup messages): 348 (5.3%)
>    Trained on 237 ham and 120 spam
>    fp: 7 fn: 2
>    Total cost: $141.60
>
> Ouf.

I hope you're at least gaining some respect for how much work went into
picking the defaults <wink>.

> I am back with the defaults, but I'd still like to do an automated
> optimization of everything simultaneously. Might try that.

Now *that* could be a useful system regardless of scheme.  I've tended to do
hill-climbing across one dimension at a time, occasionally moving batches of
params random amounts at once (to see whether that kicks it out of a
stubborn local minimum).