[Spambayes] spambayes fronting a mailing list?

Mark Hammond mhammond at skippinet.com.au
Fri Jan 17 11:03:44 EST 2003


[Tim1]
> tests), and especially because the *kinds* of spam that
> remained Unsure were
> maddeningly "obvious" spam (something I don't know how to
> test formally).

This is touching my test-of-training-strategies comments recently.

If we have a decent framework in place, then "obvious" spam would be
anything that is spam given complete data.

ie, assume we have 3000 ham and 3000 spam.  My training strategy would be to
perform a complete train over the entire database, and collect "correct"
scores for each item.  We then can test out various training strategies,
watching not only the fp/fn/unsure rates, but also deviance from the
"correct" score.

> OTOH, in real life now I started with a few hundred random
> msgs, and since
> then have done *almost* purely mistake-based training.  This
> may not be
> optimal (and I believe it is not), but leaves so little manual
> classification for me to do that I don't care.

Do you believe we can reasonable formalize some tests for these strategies?

> The important thing now is just that Barry get off his ass
> and start <wink>.

Yeah, 'cos when he is finished there are some nice training strategies I
would like him to work on <wink>

Mark.




More information about the Spambayes mailing list