[spambayes-dev] imbalance within ham or spam training sets?

Toby Dickenson tdickenson at geminidataloggers.com
Mon Nov 3 13:15:21 EST 2003


On Monday 03 November 2003 17:54, Skip Montanaro wrote:
> We know some problems arise if grossly different numbers of ham or spam
> exist in the training databases.  I wonder if there might be problems
> within datasets if different numbers of particular hams or spams have been
> used in the training.

Dont scare the new users with talk of problems.....

I train using *everything* in my kmail folders. That is 1 part spam, 4 parts 
python mailing lists, 6 parts other lists, 1 part personal email, and 4 parts 
automated log message. No perceptable problems so far.

-- 
Toby Dickenson




More information about the spambayes-dev mailing list