[spambayes-dev] Another incremental training idea...

Skip Montanaro skip at pobox.com
Thu Jan 15 08:50:22 EST 2004


    Toby> If Im reading this right, my 7:1 imbalance doesnt hurt me.

    Toby> filename:    unbal    bal1    bal2    bal3
    Toby> ham:spam:  14560:1992      1992:1992
    Toby>                    1992:1992       1992:1992
    Toby> fp total:        0       0       1       0
    Toby> fp %:         0.00    0.00    0.05    0.00
    Toby> fn total:       12       6       8       6
    Toby> fn %:         0.60    0.30    0.40    0.30
    Toby> unsure t:      102      21      23      29
    Toby> unsure %:     0.62    0.53    0.58    0.73
    Toby> real cost:  $32.40  $10.20  $22.60  $11.80
    Toby> best cost:  $27.60   $7.00   $9.80   $8.60
    Toby> h mean:       0.11    0.23    0.30    0.32
    Toby> h sdev:       1.89    2.47    3.46    3.26
    Toby> s mean:      96.93   99.06   99.04   99.02
    Toby> s sdev:      12.11    6.88    6.98    7.21
    Toby> mean diff:   96.82   98.83   98.74   98.70
    Toby> k:            6.92   10.57    9.46    9.43

It doesn't seem to have a negative effect on false positives, but it looks
like you will get roughly double the number of false negatives and 4-5x as
many unsures.

Skip



More information about the spambayes-dev mailing list