[spambayes-dev] imbalance within ham or spam training sets?
Toby Dickenson
tdickenson at geminidataloggers.com
Mon Nov 3 13:15:21 EST 2003
On Monday 03 November 2003 17:54, Skip Montanaro wrote:
> We know some problems arise if grossly different numbers of ham or spam
> exist in the training databases. I wonder if there might be problems
> within datasets if different numbers of particular hams or spams have been
> used in the training.
Dont scare the new users with talk of problems.....
I train using *everything* in my kmail folders. That is 1 part spam, 4 parts
python mailing lists, 6 parts other lists, 1 part personal email, and 4 parts
automated log message. No perceptable problems so far.
--
Toby Dickenson
More information about the spambayes-dev
mailing list