[Spambayes] experimental_ham_spam_imbalance_adjustment result

Tim Peters tim.one at comcast.net
Mon Mar 10 22:10:09 EST 2003


[Meyer, Tony]
> imbalance_false4s.txt -> imbalance_true4s.txt
> -> <stat> tested 372 hams & 48 spams against 983 hams & 155 spams
> -> <stat> tested 333 hams & 56 spams against 1022 hams & 147 spams
> -> <stat> tested 329 hams & 48 spams against 1026 hams & 155 spams
> -> <stat> tested 321 hams & 51 spams against 1034 hams & 152 spams
> -> <stat> tested 372 hams & 48 spams against 983 hams & 155 spams
> -> <stat> tested 333 hams & 56 spams against 1022 hams & 147 spams
> -> <stat> tested 329 hams & 48 spams against 1026 hams & 155 spams
> -> <stat> tested 321 hams & 51 spams against 1034 hams & 152 spams
>
> false positive percentages
>     0.000  0.000  tied
>     0.000  0.000  tied
>     0.000  0.000  tied
>     0.000  0.000  tied
>
> won   0 times
> tied  4 times
> lost  0 times
>
> total unique fp went from 0 to 0 tied
> mean fp % went from 0.0 to 0.0 tied
>
> false negative percentages
>     6.250  6.250  tied
>     0.000  0.000  tied
>     6.250  6.250  tied
>     3.922  3.922  tied
>
> won   0 times
> tied  4 times
> lost  0 times
>
> total unique fn went from 8 to 8 tied
> mean fn % went from 4.10539215686 to 4.10539215686 tied
>
> ham mean                     ham sdev
>    0.39    0.39   +0.00%        3.46    3.46   +0.00%
>    0.09    0.09   +0.00%        0.91    0.91   +0.00%
>    0.65    0.65   +0.00%        4.57    4.57   +0.00%
>    1.40    1.40   +0.00%        7.93    7.93   +0.00%
>
> ham mean and sdev for all runs
>    0.62    0.62   +0.00%        4.87    4.87   +0.00%
>
> spam mean                    spam sdev
>   87.62   87.62   +0.00%       28.34   28.34   +0.00%
>   90.83   90.83   +0.00%       18.01   18.01   +0.00%
>   91.17   91.17   +0.00%       25.61   25.61   +0.00%
>   85.65   85.65   +0.00%       25.97   25.97   +0.00%
>
> spam mean and sdev for all runs
>   88.85   88.85   +0.00%       24.68   24.68   +0.00%
>
> ham/spam mean difference: 88.23 88.23 +0.00
>
> My ham:spam ratio is about 7:1 (Mark's was about 1:2.5).  Forgive
> the newbie question, but does this mean that:
> (a) for my corpus, the options makes no difference at all?
> (b) I haven't tested with a big enough corpus?
> (c) I did something wrong ;)

(d) Something went wrong somewhere.  The listings of means and sdevs are
supremely sensitive to even the tiniest changes:  I've never seen them all
zero unless the classifiers and tokenizers going into them were actually
identical.

Given that you have more ham than spam, the expected effect of enabling the
option is to decrease your FN rate (which, at 4%, is high), and possibly
increase your FP rate (which is 0).




More information about the Spambayes mailing list