[spambayes-dev] train to exhaustion?

Fri Feb 13 09:13:29 EST 2004

Skip Montanaro wrote:
> Kenny> I wonder what would happen if you took an "incorrectness score"
> Kenny> that was the average of the distance from perfect
classification
> Kenny> over all messages, and stop if that average ever increases? 
> 
> I don't understand what you're suggesting.  What is "perfect
> classification over all messages"?

I was afraid that statement wouldn't be quite clear. <0.5 wink>

I was thinking of perfect classification of a spam being an exact 1.0
score, and perfect classification of a ham being an exact 0.0 score.  If
a ham scores as 0.01, then its distance from perfect is 0.01.  If a spam
scores as 0.99, then its distance from perfect is also 0.01.  The
"incorrectness score" I was considering would take the total of these
distances for all messages as you score them in a single round, and
divide by the total number of messages to get the average distance.

What I was wondering was whether or not going through a round where this
average distance was greater than or equal to the previous round would
be a good indicator that more iterations would not improve the overall
accuracy any further.  The intent is to have some kind of guard
condition to prevent the concern that Alex mentioned of getting caught
in an infinite iteration loop.

-- 
Kenny Pitt