[spambayes-dev] Another incremental training idea...
Seth Goodman
nobody at spamcop.net
Thu Jan 15 15:00:14 EST 2004
> > Toby> If Im reading this right, my 7:1 imbalance doesnt hurt me.
> >
> > Toby> filename: unbal bal1 bal2 bal3
> > Toby> ham:spam: 14560:1992 1992:1992
> > Toby> 1992:1992 1992:1992
> > Toby> fp total: 0 0 1 0
> > Toby> fp %: 0.00 0.00 0.05 0.00
> > Toby> fn total: 12 6 8 6
> > Toby> fn %: 0.60 0.30 0.40 0.30
> > Toby> unsure t: 102 21 23 29
> > Toby> unsure %: 0.62 0.53 0.58 0.73
>
> > [Skip Montanaro]
> > It doesn't seem to have a negative effect on false positives,
> > but it looks
> > like you will get roughly double the number of false negatives
> > and 4-5x as
> > many unsures.
>
> [Toby Dickenson]
> 4x as many unsures, out of a total population that is 4x larger.
> so no overall
> percentage change. Am I reading that right?
Yes, but if I'm reading it right, the fn's are about double as a percentage.
This looks like the case since your nham didn't change across the four data
sets, so Skip's original observation on fn's increasing 2X seems right.
--
Seth Goodman
Humans: off-list replies to sethg [at] GoodmanAssociates [dot] com
Spambots: disregard the above
More information about the spambayes-dev
mailing list