[spambayes-dev] Another incremental training idea...

Thu Jan 15 15:00:14 EST 2004

> >     Toby> If Im reading this right, my 7:1 imbalance doesnt hurt me.
> >
> >     Toby> filename:    unbal    bal1    bal2    bal3
> >     Toby> ham:spam:  14560:1992      1992:1992
> >     Toby>                    1992:1992       1992:1992
> >     Toby> fp total:        0       0       1       0
> >     Toby> fp %:         0.00    0.00    0.05    0.00
> >     Toby> fn total:       12       6       8       6
> >     Toby> fn %:         0.60    0.30    0.40    0.30
> >     Toby> unsure t:      102      21      23      29
> >     Toby> unsure %:     0.62    0.53    0.58    0.73
>
> > [Skip Montanaro]
> > It doesn't seem to have a negative effect on false positives,
> > but it looks
> > like you will get roughly double the number of false negatives
> > and 4-5x as
> > many unsures.
>
> [Toby Dickenson]
> 4x as many unsures, out of a total population that is 4x larger.
> so no overall
> percentage change. Am I reading that right?

Yes, but if I'm reading it right, the fn's are about double as a percentage.
This looks like the case since your nham didn't change across the four data
sets, so Skip's original observation on fn's increasing 2X seems right.

--
Seth Goodman

  Humans:   off-list replies to sethg [at] GoodmanAssociates [dot] com

  Spambots: disregard the above