[spambayes-dev] Another incremental training idea...

Seth Goodman nobody at spamcop.net
Tue Jan 13 18:26:47 EST 2004

[Kenny Pitt]
> I've also been kicking around some auto-training ideas hoping for time
> to try them.  One idea I had was based on a "sliding non-edge" scale.
> You would set a max imbalance, say 2:1, beyond which you would train on
> everything on the low side.  As your imbalance falls back below the
> maximum, auto-train would start skipping the "edge" messages with near
> perfect classification scores.  The closer you get to a perfect 1:1
> balance, the closer to the cutoff score the message would need to be
> before it would get auto-trained.  Anyone see any obvious holes in this
> idea?

No obvious problems to me.

Another related idea is to dynamically move the edge thresholds until the
training ratio averages 1:1.

Seth Goodman

