[spambayes-dev] Another incremental training idea...

Wed Jan 14 20:53:52 EST 2004

Tony Meyer wrote:
 > Kenny Pitt wrote:

>>I've also been kicking around some auto-training ideas hoping for time 
>>to try them.  One idea I had was based on a "sliding non-edge" scale. 
>>You would set a max imbalance, say 2:1, beyond which you would train 
>>on everything on the low side.
>>As your imbalance falls back below the maximum, auto-train 
>>would start skipping the "edge" messages with near perfect 
>>classification scores.  The closer you get to a perfect 1:1 
>>balance, the closer to the cutoff score the message would 
>>need to be before it would get auto-trained.  Anyone see any 
>>obvious holes in this idea?
>>
> 
> I tried almost this with the incremental regime, using a maximum of 2::1 or
> 1::2.  It did pretty consistently worse than the basic nonedge regime.  The
> only difference is that I didn't choose which messages to use if an
> imbalance would be created.  The idea was basically to do nonedge, except if
> there was an imbalance, and then only train messages that move the balance
> closer to 1::1.

It sounds like you are saying that non-edge messages on the heavy side 
were not trained.  It seems that would be a key difference.  Was that 
the case in your test?

Eli